Top selling items - revisited in 12c
April last year I blogged about TOP-N reporting using Top selling items as example. In Oracle 12c we now have a new FETCH FIRST syntax, so in this post I'll revisit the Top selling items example showing where and how FETCH FIRST can be used and where you still need to know the analytic function methods.
I use the same tables of items and monthly sales as in last years blog post:
And the same base query to compute total sales for 2011 by item:
Pre-12c we could use analytic function ROW_NUMBER() like this:
For the MOBILE group using ROW_NUMBER gives us just one of the two items that sold 28 pieces.
Same results can be found using 12c FETCH FIRST [number] ROWS ONLY:
Let us look at the explain plan for using ROW_NUMBER analytic function:
And then the explain plan for using FETCH FIRST 3 ROWS ONLY:
They are rather identical. FETCH FIRST [number] ROWS ONLY is internally rewritten to use ROW_NUMBER() function - the difference being in the first plan there is an explicit inline view with filter predicate G.RNK <= 3, while the second plan has a system generated inline view filtering on rowlimit_$$_rownumber.
Using both these methods, we only get one of items iGloves and Cover. Which one will be "random" in the sense that Oracle is lazy and just returns the first one it retrieves by whatever access plan it happens to use.
In order to get consistent results we can decide how to "break the ties" if add some more to the ORDER BY to make it unique. For example we might decide that in case of ties, always return lowest item number first:
Or similarly for 12c:
In order to actually retrieve the tied rows, we can use analytic function RANK() rather than ROW_NUMBER():
In 12c the same can be accomplished using FETCH FIRST [number] ROWS WITH TIES:
WITH TIES syntax works the way that once the desired number of rows has been fetched, if there are any more rows with the same value as the last one, then those rows will also be fetched.
Again we can compare the explain plan of using analytic RANK:
With the explain plan of FETCH FIRST 3 ROWS WITH TIES
Again they are very much alike - WITH TIES is rewritten to use RANK() and there is a filter predicate on rowlimit_$$_rank.
One difference here is that using analytic function RANK allows us to control the output order of the tied rows, because we have two ORDER BY clauses:
The ORDER BY within the analytic function is non-unique, therefore we get our ties. But we can guarantee consistent results by making the final ORDER BY in the select "unique."
This is not possible with FETCH FIRST ROWS WITH TIES syntax as we only have one ORDER BY clause, so the order that iGloves and Cover will be presented is non-deterministic - it can be different from execution to execution if the access plan changes. It probably won't matter, but some end users might become confused if the report suddenly changes the order of output ;-).
With the analytic functions we have a third method for a TOP-N query. If we define TOP-N as "find the top N values of QTY, return all items that have sold those quantities", then we should get four rows for group AUTO. This can be accomplished using DENSE_RANK():
This cannot (yet) be accomplished by FETCH FIRST syntax. But as we have seen that FETCH FIRST is internally rewritten as using the analytic functions, then we could easily imagine a syntax similar to this:
Or perhaps something like:
Maybe in some future release this will be possible ;-).
One thing that the analytic method can do is produce TOP-N in "partitions", like for example this one:
There is no PARTITION BY for the FETCH FIRST syntax, but for fun I tried doing it using another new 12c construct - the lateral inline view:
But that uses hardcoded values for the groups, so I went further and produced this:
Sure, this eliminates the hardcoding of the groups, but at the cost of creating a temporary table that is accessed multiple times. No, this is not a good idea, this was just fun to write, but not recommended as a solution. Use the analytic function method of using PARTITION BY instead, it performs much better :-).
So the new FETCH FIRST syntax in 12c is a nice syntax to do some of the analytic TOP-N queries using simpler code more easy to read. But as yet it cannot replace the analytics in all cases, so it will do you good to know analytics even when coding for 12c :-).
The 12c documentation shows the row limiting clause with explanation as well as examples.
The script for the demos above can be downloaded here.
I use the same tables of items and monthly sales as in last years blog post:
create table items( item varchar2(10) primary key, grp varchar2(10), name varchar2(20) ) / create table sales ( item varchar2(10) references items (item), mth date, qty number ) / begin insert into items values ('101010','AUTO','Brake disc'); insert into items values ('102020','AUTO','Snow chain'); insert into items values ('103030','AUTO','Sparc plug'); insert into items values ('104040','AUTO','Oil filter'); insert into items values ('105050','AUTO','Light bulb'); insert into items values ('201010','MOBILE','Handsfree'); insert into items values ('202020','MOBILE','Charger'); insert into items values ('203030','MOBILE','iGloves'); insert into items values ('204040','MOBILE','Headset'); insert into items values ('205050','MOBILE','Cover'); insert into sales values ('101010',date '2011-04-01',10); insert into sales values ('101010',date '2011-05-01',11); insert into sales values ('101010',date '2011-06-01',12); insert into sales values ('102020',date '2011-03-01', 7); insert into sales values ('102020',date '2011-07-01', 8); insert into sales values ('103030',date '2011-01-01', 6); insert into sales values ('103030',date '2011-02-01', 9); insert into sales values ('103030',date '2011-11-01', 4); insert into sales values ('103030',date '2011-12-01',14); insert into sales values ('104040',date '2011-08-01',22); insert into sales values ('105050',date '2011-09-01',13); insert into sales values ('105050',date '2011-10-01',15); insert into sales values ('201010',date '2011-04-01', 5); insert into sales values ('201010',date '2011-05-01', 6); insert into sales values ('201010',date '2011-06-01', 7); insert into sales values ('202020',date '2011-03-01',21); insert into sales values ('202020',date '2011-07-01',23); insert into sales values ('203030',date '2011-01-01', 7); insert into sales values ('203030',date '2011-02-01', 7); insert into sales values ('203030',date '2011-11-01', 6); insert into sales values ('203030',date '2011-12-01', 8); insert into sales values ('204040',date '2011-08-01',35); insert into sales values ('205050',date '2011-09-01',13); insert into sales values ('205050',date '2011-10-01',15); commit; end; /
And the same base query to compute total sales for 2011 by item:
select i.grp , i.item , max(i.name) name , sum(s.qty) qty from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' group by i.grp, i.item order by i.grp, sum(s.qty) desc, i.item /
GRP ITEM NAME QTY ---------- ---------- -------------------- ---------- AUTO 101010 Brake disc 33 AUTO 103030 Sparc plug 33 AUTO 105050 Light bulb 28 AUTO 104040 Oil filter 22 AUTO 102020 Snow chain 15 MOBILE 202020 Charger 44 MOBILE 204040 Headset 35 MOBILE 203030 iGloves 28 MOBILE 205050 Cover 28 MOBILE 201010 Handsfree 18 10 rows selected.
Pre-12c we could use analytic function ROW_NUMBER() like this:
select g.item , g.name , g.qty from ( select i.item , max(i.name) name , sum(s.qty) qty , row_number() over (order by sum(s.qty) desc) rnk from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'AUTO' group by i.item ) g where g.rnk <= 3 order by g.rnk, g.item /
ITEM NAME QTY ---------- -------------------- ---------- 103030 Sparc plug 33 101010 Brake disc 33 105050 Light bulb 28
select g.item , g.name , g.qty from ( select i.item , max(i.name) name , sum(s.qty) qty , row_number() over (order by sum(s.qty) desc) rnk from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'MOBILE' group by i.item ) g where g.rnk <= 3 order by g.rnk, g.item /
ITEM NAME QTY ---------- -------------------- ---------- 202020 Charger 44 204040 Headset 35 203030 iGloves 28
For the MOBILE group using ROW_NUMBER gives us just one of the two items that sold 28 pieces.
Same results can be found using 12c FETCH FIRST [number] ROWS ONLY:
select i.item , max(i.name) name , sum(s.qty) qty from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'AUTO' group by i.item order by sum(s.qty) desc fetch first 3 rows only /
ITEM NAME QTY ---------- -------------------- ---------- 103030 Sparc plug 33 101010 Brake disc 33 105050 Light bulb 28
select i.item , max(i.name) name , sum(s.qty) qty from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'MOBILE' group by i.item order by sum(s.qty) desc fetch first 3 rows only /
ITEM NAME QTY ---------- -------------------- ---------- 202020 Charger 44 204040 Headset 35 203030 iGloves 28
Let us look at the explain plan for using ROW_NUMBER analytic function:
And then the explain plan for using FETCH FIRST 3 ROWS ONLY:
They are rather identical. FETCH FIRST [number] ROWS ONLY is internally rewritten to use ROW_NUMBER() function - the difference being in the first plan there is an explicit inline view with filter predicate G.RNK <= 3, while the second plan has a system generated inline view filtering on rowlimit_$$_rownumber.
Using both these methods, we only get one of items iGloves and Cover. Which one will be "random" in the sense that Oracle is lazy and just returns the first one it retrieves by whatever access plan it happens to use.
In order to get consistent results we can decide how to "break the ties" if add some more to the ORDER BY to make it unique. For example we might decide that in case of ties, always return lowest item number first:
, row_number() over (order by sum(s.qty) desc, i.item) rnk
Or similarly for 12c:
order by sum(s.qty) desc, i.item fetch first 3 rows only
In order to actually retrieve the tied rows, we can use analytic function RANK() rather than ROW_NUMBER():
select g.item , g.name , g.qty from ( select i.item , max(i.name) name , sum(s.qty) qty , rank() over (order by sum(s.qty) desc) rnk from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'AUTO' group by i.item ) g where g.rnk <= 3 order by g.rnk, g.item /
ITEM NAME QTY ---------- -------------------- ---------- 101010 Brake disc 33 103030 Sparc plug 33 105050 Light bulb 28
select g.item , g.name , g.qty from ( select i.item , max(i.name) name , sum(s.qty) qty , rank() over (order by sum(s.qty) desc) rnk from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'MOBILE' group by i.item ) g where g.rnk <= 3 order by g.rnk, g.item /
ITEM NAME QTY ---------- -------------------- ---------- 202020 Charger 44 204040 Headset 35 203030 iGloves 28 205050 Cover 28
In 12c the same can be accomplished using FETCH FIRST [number] ROWS WITH TIES:
select i.item , max(i.name) name , sum(s.qty) qty from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'AUTO' group by i.item order by sum(s.qty) desc fetch first 3 rows with ties /
ITEM NAME QTY ---------- -------------------- ---------- 103030 Sparc plug 33 101010 Brake disc 33 105050 Light bulb 28
select i.item , max(i.name) name , sum(s.qty) qty from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'MOBILE' group by i.item order by sum(s.qty) desc fetch first 3 rows with ties /
ITEM NAME QTY ---------- -------------------- ---------- 202020 Charger 44 204040 Headset 35 203030 iGloves 28 205050 Cover 28
WITH TIES syntax works the way that once the desired number of rows has been fetched, if there are any more rows with the same value as the last one, then those rows will also be fetched.
Again we can compare the explain plan of using analytic RANK:
With the explain plan of FETCH FIRST 3 ROWS WITH TIES
Again they are very much alike - WITH TIES is rewritten to use RANK() and there is a filter predicate on rowlimit_$$_rank.
One difference here is that using analytic function RANK allows us to control the output order of the tied rows, because we have two ORDER BY clauses:
, rank() over (order by sum(s.qty) desc) rnk ... order by g.rnk, g.item
The ORDER BY within the analytic function is non-unique, therefore we get our ties. But we can guarantee consistent results by making the final ORDER BY in the select "unique."
This is not possible with FETCH FIRST ROWS WITH TIES syntax as we only have one ORDER BY clause, so the order that iGloves and Cover will be presented is non-deterministic - it can be different from execution to execution if the access plan changes. It probably won't matter, but some end users might become confused if the report suddenly changes the order of output ;-).
With the analytic functions we have a third method for a TOP-N query. If we define TOP-N as "find the top N values of QTY, return all items that have sold those quantities", then we should get four rows for group AUTO. This can be accomplished using DENSE_RANK():
select g.item , g.name , g.qty from ( select i.item , max(i.name) name , sum(s.qty) qty , dense_rank() over (order by sum(s.qty) desc) rnk from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'AUTO' group by i.item ) g where g.rnk <= 3 order by g.rnk, g.item /
ITEM NAME QTY ---------- -------------------- ---------- 101010 Brake disc 33 103030 Sparc plug 33 105050 Light bulb 28 104040 Oil filter 22
select g.item , g.name , g.qty from ( select i.item , max(i.name) name , sum(s.qty) qty , dense_rank() over (order by sum(s.qty) desc) rnk from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'MOBILE' group by i.item ) g where g.rnk <= 3 order by g.rnk, g.item /
ITEM NAME QTY ---------- -------------------- ---------- 202020 Charger 44 204040 Headset 35 203030 iGloves 28 205050 Cover 28
This cannot (yet) be accomplished by FETCH FIRST syntax. But as we have seen that FETCH FIRST is internally rewritten as using the analytic functions, then we could easily imagine a syntax similar to this:
select i.item , max(i.name) name , sum(s.qty) qty from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = 'AUTO' group by i.item order by sum(s.qty) desc fetch first 3 rows dense_ranked /* hypothetical and unsupported syntax */ /
Or perhaps something like:
fetch first 3 values /* hypothetical and unsupported syntax */
Maybe in some future release this will be possible ;-).
One thing that the analytic method can do is produce TOP-N in "partitions", like for example this one:
select g.grp , g.item , g.name , g.qty from ( select i.grp , i.item , max(i.name) name , sum(s.qty) qty , rank() over (partition by i.grp order by sum(s.qty) desc) rnk from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' group by i.grp, i.item ) g where g.rnk <= 3 order by g.grp, g.rnk, g.item /
GRP ITEM NAME QTY ---------- ---------- -------------------- ---------- AUTO 101010 Brake disc 33 AUTO 103030 Sparc plug 33 AUTO 105050 Light bulb 28 MOBILE 202020 Charger 44 MOBILE 204040 Headset 35 MOBILE 203030 iGloves 28 MOBILE 205050 Cover 28 7 rows selected.
There is no PARTITION BY for the FETCH FIRST syntax, but for fun I tried doing it using another new 12c construct - the lateral inline view:
select g.grp , r.item , r.name , r.qty from ( select 'AUTO' grp from dual union all select 'MOBILE' grp from dual ) g cross join lateral ( select i.item , max(i.name) name , sum(s.qty) qty from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' and i.grp = g.grp group by i.item order by sum(s.qty) desc fetch first 3 rows with ties ) r order by g.grp, r.qty desc, r.item /
GRP ITEM NAME QTY ------ ---------- -------------------- ---------- AUTO 101010 Brake disc 33 AUTO 103030 Sparc plug 33 AUTO 105050 Light bulb 28 MOBILE 202020 Charger 44 MOBILE 204040 Headset 35 MOBILE 203030 iGloves 28 MOBILE 205050 Cover 28 7 rows selected.
But that uses hardcoded values for the groups, so I went further and produced this:
with data as ( select i.grp , i.item , max(i.name) name , sum(s.qty) qty from items i join sales s on s.item = i.item where s.mth between date '2011-01-01' and date '2011-12-01' group by i.grp, i.item ) select g.grp , r.item , r.name , r.qty from ( select distinct d.grp from data d ) g cross join lateral ( select d.item , d.name , d.qty from data d where d.grp = g.grp order by d.qty desc fetch first 3 rows with ties ) r order by g.grp, r.qty desc, r.item /
GRP ITEM NAME QTY ---------- ---------- -------------------- ---------- AUTO 101010 Brake disc 33 AUTO 103030 Sparc plug 33 AUTO 105050 Light bulb 28 MOBILE 202020 Charger 44 MOBILE 204040 Headset 35 MOBILE 203030 iGloves 28 MOBILE 205050 Cover 28 7 rows selected.
Sure, this eliminates the hardcoding of the groups, but at the cost of creating a temporary table that is accessed multiple times. No, this is not a good idea, this was just fun to write, but not recommended as a solution. Use the analytic function method of using PARTITION BY instead, it performs much better :-).
So the new FETCH FIRST syntax in 12c is a nice syntax to do some of the analytic TOP-N queries using simpler code more easy to read. But as yet it cannot replace the analytics in all cases, so it will do you good to know analytics even when coding for 12c :-).
The 12c documentation shows the row limiting clause with explanation as well as examples.
The script for the demos above can be downloaded here.
Comments
Post a Comment