Top selling items
A classic task given to a programmer is to make a TOP-N report of some data. Often a TOP within each group of some defined grouping (department, country, product type, etc.) And many times the report should also include the percentage of the total - even though you only display the TOP records.
Many ways can be devised for this and many ways has been used and blogged about and given lectures on. This is nothing new, but it is a technique we often use in many practical cases. One typical case is a top list of best selling items within product groups.
Many people have written and shown this technique, you can google it and see... But you will hear me speak on this at KSCope12 (you are going to be there, aren't you?!) so I might as well put this demo on the blog for your convenience ;-) The demo script for this can be downloaded here.
We create a couple of tables first. A table of items with a column grp to identify the product group the item belongs to, and a table of sales history. For this demo the sales table stores sales history by the month, but the technique would be identical if it were weekly or daily data or even if it were individual order lines.
We populate the tables with 10 typical items for our shop - 5 spare parts for cars, 5 accessories for mobile phones.
Now let’s see the result of an ordinary aggregate group by for sales data for the year 2011:
Ordering by the sum shows us which items are the top three selling items in each group. Let us see if we cannot do the same with analytic functions :-)
First let’s try using the aggregate query as source for three different analytic functions - DENSE_RANK(), RANK() and ROW_NUMBER(). Each of them we do PARTITION BY in order to get a ranking within each produkt group and we use ORDER BY to specify the ranking order:
Notice the differences in the three ranking functions:
You can get the same result as above without using an inline view. It is possible simply to use the analytic functions directly on the aggregations:
The effect is the same - the aggregation is done first and then the analytic functions are applied. Sometimes it can be nice to do the inline view anyway for readability and to make it clear what happens, but for simpler cases like this I tend to prefer the compact method. One of the reasons is that often you need to put the entire thing in another inline view anyway in order to filter the result, and then you can get confused by too many inline views inside inline views :-)
The analytic functions cannot be used directly in a where clause, so when we wish to filter on the ranking function in order to get our TOP-3 report, we need an inline view:
Here we used RANK and thus we got three AUTO items and four MOBILE items. Let us try DENSE_RANK:
That gave us four items in each group. So how about ROW_NUMBER:
That gave us exactly three items in each group. Notice here that even though both iGloves and Cover sold 28 pieces and ties for third place in group MOBILE, we only get iGloves here. That is because we told Oracle to resolve ties by ordering by item id and 203030 is less than 205050. If we had omitted the “, i.item” part of the ORDER BY for ROW_NUMBER(), then it would have been random whether we got iGloves or Cover.
Along with the quantity sold and the rank, we would also like to know how big a percentage of the total sales each item had. And we would like that information both within the product groups, but at the same time also how big percentage of the grand total sold.
For that we use RATIO_TO_REPORT which returns the ratio of the particular row to the total (then we multiply the ratio by 100 to get percent.)
If we use the PARTITION BY clause, then we can get a percentage within the group. If we omit the PARTITION BY, then we get a percentage of the grand total:
Here we see that light bulb was 21.4% of the total sales in the AUTO group and 9.9% of the grand total sales.
This is technique we use often in many of our reports. Sometimes (as this case) we combine aggregation with analytics, where the aggregation creates the set of data that the analytic functions then work on. Sometimes the data is ready for use by the analytic functions and no aggregation is needed. That depends on the actual case whether you need aggregation combined with analytics.
Read more in the Oracle documentation on DENSE_RANK, RANK, ROW_NUMBER and RATIO_TO_REPORT.
Many ways can be devised for this and many ways has been used and blogged about and given lectures on. This is nothing new, but it is a technique we often use in many practical cases. One typical case is a top list of best selling items within product groups.
Many people have written and shown this technique, you can google it and see... But you will hear me speak on this at KSCope12 (you are going to be there, aren't you?!) so I might as well put this demo on the blog for your convenience ;-) The demo script for this can be downloaded here.
We create a couple of tables first. A table of items with a column grp to identify the product group the item belongs to, and a table of sales history. For this demo the sales table stores sales history by the month, but the technique would be identical if it were weekly or daily data or even if it were individual order lines.
SQL> create table items(
2 item varchar2(10) primary key,
3 grp varchar2(10),
4 name varchar2(20)
5 )
6 /
Table created.
SQL> create table sales (
2 item varchar2(10) references items (item),
3 mth date,
4 qty number
5 )
6 /
Table created.
We populate the tables with 10 typical items for our shop - 5 spare parts for cars, 5 accessories for mobile phones.
SQL> begin
2 insert into items values ('101010','AUTO','Brake disc');
3 insert into items values ('102020','AUTO','Snow chain');
4 insert into items values ('103030','AUTO','Sparc plug');
5 insert into items values ('104040','AUTO','Oil filter');
6 insert into items values ('105050','AUTO','Light bulb');
7
8 insert into items values ('201010','MOBILE','Handsfree');
9 insert into items values ('202020','MOBILE','Charger');
10 insert into items values ('203030','MOBILE','iGloves');
11 insert into items values ('204040','MOBILE','Headset');
12 insert into items values ('205050','MOBILE','Cover');
13
14 insert into sales values ('101010',date '2011-04-01',10);
15 insert into sales values ('101010',date '2011-05-01',11);
16 insert into sales values ('101010',date '2011-06-01',12);
17 insert into sales values ('102020',date '2011-03-01', 7);
18 insert into sales values ('102020',date '2011-07-01', 8);
19 insert into sales values ('103030',date '2011-01-01', 6);
20 insert into sales values ('103030',date '2011-02-01', 9);
21 insert into sales values ('103030',date '2011-11-01', 4);
22 insert into sales values ('103030',date '2011-12-01',14);
23 insert into sales values ('104040',date '2011-08-01',22);
24 insert into sales values ('105050',date '2011-09-01',13);
25 insert into sales values ('105050',date '2011-10-01',15);
26
27 insert into sales values ('201010',date '2011-04-01', 5);
28 insert into sales values ('201010',date '2011-05-01', 6);
29 insert into sales values ('201010',date '2011-06-01', 7);
30 insert into sales values ('202020',date '2011-03-01',21);
31 insert into sales values ('202020',date '2011-07-01',23);
32 insert into sales values ('203030',date '2011-01-01', 7);
33 insert into sales values ('203030',date '2011-02-01', 7);
34 insert into sales values ('203030',date '2011-11-01', 6);
35 insert into sales values ('203030',date '2011-12-01', 8);
36 insert into sales values ('204040',date '2011-08-01',35);
37 insert into sales values ('205050',date '2011-09-01',13);
38 insert into sales values ('205050',date '2011-10-01',15);
39
40 commit;
41 end;
42 /
PL/SQL procedure successfully completed.
Now let’s see the result of an ordinary aggregate group by for sales data for the year 2011:
SQL> select i.grp
2 , i.item
3 , max(i.name) name
4 , sum(s.qty) qty
5 from items i
6 join sales s
7 on s.item = i.item
8 where s.mth between date '2011-01-01' and date '2011-12-01'
9 group by i.grp, i.item
10 order by i.grp, sum(s.qty) desc, i.item
11 /
GRP ITEM NAME QTY
---------- ---------- -------------------- ----------
AUTO 101010 Brake disc 33
AUTO 103030 Sparc plug 33
AUTO 105050 Light bulb 28
AUTO 104040 Oil filter 22
AUTO 102020 Snow chain 15
MOBILE 202020 Charger 44
MOBILE 204040 Headset 35
MOBILE 203030 iGloves 28
MOBILE 205050 Cover 28
MOBILE 201010 Handsfree 18
10 rows selected.
Ordering by the sum shows us which items are the top three selling items in each group. Let us see if we cannot do the same with analytic functions :-)
First let’s try using the aggregate query as source for three different analytic functions - DENSE_RANK(), RANK() and ROW_NUMBER(). Each of them we do PARTITION BY in order to get a ranking within each produkt group and we use ORDER BY to specify the ranking order:
SQL> select g.grp
2 , g.item
3 , g.name
4 , g.qty
5 , dense_rank() over (partition by g.grp order by g.qty desc) drnk
6 , rank() over (partition by g.grp order by g.qty desc) rnk
7 , row_number() over (partition by g.grp order by g.qty desc, g.item) rnum
8 from (
9 select i.grp
10 , i.item
11 , max(i.name) name
12 , sum(s.qty) qty
13 from items i
14 join sales s
15 on s.item = i.item
16 where s.mth between date '2011-01-01' and date '2011-12-01'
17 group by i.grp, i.item
18 ) g
19 order by g.grp, g.qty desc, g.item
20 /
GRP ITEM NAME QTY DRNK RNK RNUM
---------- ---------- ------------- ---------- ---------- ---------- ----------
AUTO 101010 Brake disc 33 1 1 1
AUTO 103030 Sparc plug 33 1 1 2
AUTO 105050 Light bulb 28 2 3 3
AUTO 104040 Oil filter 22 3 4 4
AUTO 102020 Snow chain 15 4 5 5
MOBILE 202020 Charger 44 1 1 1
MOBILE 204040 Headset 35 2 2 2
MOBILE 203030 iGloves 28 3 3 3
MOBILE 205050 Cover 28 3 3 4
MOBILE 201010 Handsfree 18 4 5 5
10 rows selected.
Notice the differences in the three ranking functions:
- DENSE_RANK gives the same rank to items having the same value - brake disc and sparc plug both get 1, light bulb gets 2.
- RANK does the same, but it works like olympic medals: if there are two gold medals then there is no silver medal, the next one gets bronze - here light bulb gets 3.
- ROW_NUMBER never gives the same rank, it will always give 1, 2, 3, 4 … If two items have the same value it will be “random” which one gets to be first - here brake disc gets 1 and sparc plug gets 2, but that is because we gave the “ORDER BY g.qty desc, g.item”, this tells Oracle that if there is a tie for g.qty, then use item id for ordering and breaking the tie. For ROW_NUMBER it is good practice to always use an ORDER BY clause that ensures consistent ordering - otherwise you might get different results in different runs of the query.
You can get the same result as above without using an inline view. It is possible simply to use the analytic functions directly on the aggregations:
SQL> select i.grp
2 , i.item
3 , max(i.name) name
4 , sum(s.qty) qty
5 , dense_rank() over (partition by i.grp order by sum(s.qty) desc) drnk
6 , rank() over (partition by i.grp order by sum(s.qty) desc) rnk
7 , row_number() over (partition by i.grp order by sum(s.qty) desc, i.item) rnum
8 from items i
9 join sales s
10 on s.item = i.item
11 where s.mth between date '2011-01-01' and date '2011-12-01'
12 group by i.grp, i.item
13 order by i.grp, sum(s.qty) desc, i.item
14 /
GRP ITEM NAME QTY DRNK RNK RNUM
---------- ---------- ------------- ---------- ---------- ---------- ----------
AUTO 101010 Brake disc 33 1 1 1
AUTO 103030 Sparc plug 33 1 1 2
AUTO 105050 Light bulb 28 2 3 3
AUTO 104040 Oil filter 22 3 4 4
AUTO 102020 Snow chain 15 4 5 5
MOBILE 202020 Charger 44 1 1 1
MOBILE 204040 Headset 35 2 2 2
MOBILE 203030 iGloves 28 3 3 3
MOBILE 205050 Cover 28 3 3 4
MOBILE 201010 Handsfree 18 4 5 5
10 rows selected.
The effect is the same - the aggregation is done first and then the analytic functions are applied. Sometimes it can be nice to do the inline view anyway for readability and to make it clear what happens, but for simpler cases like this I tend to prefer the compact method. One of the reasons is that often you need to put the entire thing in another inline view anyway in order to filter the result, and then you can get confused by too many inline views inside inline views :-)
The analytic functions cannot be used directly in a where clause, so when we wish to filter on the ranking function in order to get our TOP-3 report, we need an inline view:
SQL> select g.grp
2 , g.item
3 , g.name
4 , g.qty
5 , g.rnk
6 from (
7 select i.grp
8 , i.item
9 , max(i.name) name
10 , sum(s.qty) qty
11 , rank() over (partition by i.grp order by sum(s.qty) desc) rnk
12 from items i
13 join sales s
14 on s.item = i.item
15 where s.mth between date '2011-01-01' and date '2011-12-01'
16 group by i.grp, i.item
17 ) g
18 where g.rnk <= 3
19 order by g.grp, g.rnk, g.item
20 /
GRP ITEM NAME QTY RNK
---------- ---------- -------------------- ---------- ----------
AUTO 101010 Brake disc 33 1
AUTO 103030 Sparc plug 33 1
AUTO 105050 Light bulb 28 3
MOBILE 202020 Charger 44 1
MOBILE 204040 Headset 35 2
MOBILE 203030 iGloves 28 3
MOBILE 205050 Cover 28 3
7 rows selected.
Here we used RANK and thus we got three AUTO items and four MOBILE items. Let us try DENSE_RANK:
SQL> select g.grp
2 , g.item
3 , g.name
4 , g.qty
5 , g.rnk
6 from (
7 select i.grp
8 , i.item
9 , max(i.name) name
10 , sum(s.qty) qty
11 , dense_rank() over (partition by i.grp order by sum(s.qty) desc) rnk
12 from items i
13 join sales s
14 on s.item = i.item
15 where s.mth between date '2011-01-01' and date '2011-12-01'
16 group by i.grp, i.item
17 ) g
18 where g.rnk <= 3
19 order by g.grp, g.rnk, g.item
20 /
GRP ITEM NAME QTY RNK
---------- ---------- -------------------- ---------- ----------
AUTO 101010 Brake disc 33 1
AUTO 103030 Sparc plug 33 1
AUTO 105050 Light bulb 28 2
AUTO 104040 Oil filter 22 3
MOBILE 202020 Charger 44 1
MOBILE 204040 Headset 35 2
MOBILE 203030 iGloves 28 3
MOBILE 205050 Cover 28 3
8 rows selected.
That gave us four items in each group. So how about ROW_NUMBER:
SQL> select g.grp
2 , g.item
3 , g.name
4 , g.qty
5 , g.rnk
6 from (
7 select i.grp
8 , i.item
9 , max(i.name) name
10 , sum(s.qty) qty
11 , row_number() over (partition by i.grp order by sum(s.qty) desc, i.item) rnk
12 from items i
13 join sales s
14 on s.item = i.item
15 where s.mth between date '2011-01-01' and date '2011-12-01'
16 group by i.grp, i.item
17 ) g
18 where g.rnk <= 3
19 order by g.grp, g.rnk, g.item
20 /
GRP ITEM NAME QTY RNK
---------- ---------- -------------------- ---------- ----------
AUTO 101010 Brake disc 33 1
AUTO 103030 Sparc plug 33 2
AUTO 105050 Light bulb 28 3
MOBILE 202020 Charger 44 1
MOBILE 204040 Headset 35 2
MOBILE 203030 iGloves 28 3
6 rows selected.
That gave us exactly three items in each group. Notice here that even though both iGloves and Cover sold 28 pieces and ties for third place in group MOBILE, we only get iGloves here. That is because we told Oracle to resolve ties by ordering by item id and 203030 is less than 205050. If we had omitted the “, i.item” part of the ORDER BY for ROW_NUMBER(), then it would have been random whether we got iGloves or Cover.
Along with the quantity sold and the rank, we would also like to know how big a percentage of the total sales each item had. And we would like that information both within the product groups, but at the same time also how big percentage of the grand total sold.
For that we use RATIO_TO_REPORT which returns the ratio of the particular row to the total (then we multiply the ratio by 100 to get percent.)
If we use the PARTITION BY clause, then we can get a percentage within the group. If we omit the PARTITION BY, then we get a percentage of the grand total:
SQL> select g.grp
2 , g.item
3 , g.name
4 , g.qty
5 , g.rnk
6 , round(g.g_pct,1) g_pct
7 , round(g.t_pct,1) t_pct
8 from (
9 select i.grp
10 , i.item
11 , max(i.name) name
12 , sum(s.qty) qty
13 , rank() over (partition by i.grp order by sum(s.qty) desc) rnk
14 , 100 * ratio_to_report(sum(s.qty)) over (partition by i.grp) g_pct
15 , 100 * ratio_to_report(sum(s.qty)) over () t_pct
16 from items i
17 join sales s
18 on s.item = i.item
19 where s.mth between date '2011-01-01' and date '2011-12-01'
20 group by i.grp, i.item
21 ) g
22 where g.rnk <= 3
23 order by g.grp, g.rnk, g.item
24 /
GRP ITEM NAME QTY RNK G_PCT T_PCT
---------- ---------- ------------- ---------- ---------- ---------- ----------
AUTO 101010 Brake disc 33 1 25.2 11.6
AUTO 103030 Sparc plug 33 1 25.2 11.6
AUTO 105050 Light bulb 28 3 21.4 9.9
MOBILE 202020 Charger 44 1 28.8 15.5
MOBILE 204040 Headset 35 2 22.9 12.3
MOBILE 203030 iGloves 28 3 18.3 9.9
MOBILE 205050 Cover 28 3 18.3 9.9
7 rows selected.
Here we see that light bulb was 21.4% of the total sales in the AUTO group and 9.9% of the grand total sales.
This is technique we use often in many of our reports. Sometimes (as this case) we combine aggregation with analytics, where the aggregation creates the set of data that the analytic functions then work on. Sometimes the data is ready for use by the analytic functions and no aggregation is needed. That depends on the actual case whether you need aggregation combined with analytics.
Read more in the Oracle documentation on DENSE_RANK, RANK, ROW_NUMBER and RATIO_TO_REPORT.
very nicely explained.
ReplyDelete