RANGE BETWEEN and leap years
Answering a question on the OTN forum was a bit tricky to get an analytic sum using a RANGE BETWEEN that would handle leap years, but in the end I came up with a workaround that satisfies the requirement. Along the way I realized why there are two different INTERVAL datatypes :-)
Let's make a sales table to demo this:
And populate with some data for specific days in 2010, 2011 and 2012:
First business requirement is "for each day we want the sales of that day + the three preceding days" - no problem:
Then is added another requirement "we also want the sales for the same 4 days of the previous year".
Well we might try this one then:
Now why won't that work? Hmmm... OK, we can try with days instead of years:
Works for 2011 but not for 2012, because 2012 is a leap year. OK, we use a different set of days:
Well, now 2012 works but not 2011. We need somehow to identify if the year has 366 or 365 days. For that purpose we can use the year interval:
We saw that the correct range was 369-366 for leap years and 368-365 for non-leap years. So let's get data for the ranges 369, 368-366 and 365:
And then we can calculate the correct sum:
(I used the same numbers for all three years to make it easy to spot when four_lastyear is correct :-)
Let's just take a closer look at the interval calculation:
No problem. That calculation is first substracting an INTERVAL YEAR TO MONTH literal from a DATE column - the result will be a DATE. Then it subtracts an INTERVAL DAY TO SECOND literal from the resulting DATE which in turn returns yet another DATE. Perfect.
Now look at this one - strictly "mathematically" this should be equivalent:
Now we fail - because we cannot add an INTERVAL DAY TO SECOND datatype to an INTERVAL YEAR TO MONTH.
Why not? Because an interval of one year or an interval of one month cannot be expressed as a certain number of days!
Subtract 1 year from a date in October 2012 will subtract 366 days, but subtracting 1 year from a date in October 2011 will subtract 365 days.
And so it is not possible to express "one year and 3 days" in a single interval expression - it needs two interval expressions of two different datatypes.
And therefore it cannot be used in a RANGE BETWEEN expression - we need to do the cryptic workaround shown above.
Let's make a sales table to demo this:
create table sales ( day date , qty number ) /
And populate with some data for specific days in 2010, 2011 and 2012:
begin insert into sales values (date '2010-10-01', 1); insert into sales values (date '2010-10-02', 2); insert into sales values (date '2010-10-03', 3); insert into sales values (date '2010-10-04', 4); insert into sales values (date '2010-10-05', 5); insert into sales values (date '2010-10-06', 6); insert into sales values (date '2010-10-07', 7); insert into sales values (date '2011-10-01', 1); insert into sales values (date '2011-10-02', 2); insert into sales values (date '2011-10-03', 3); insert into sales values (date '2011-10-04', 4); insert into sales values (date '2011-10-05', 5); insert into sales values (date '2011-10-06', 6); insert into sales values (date '2011-10-07', 7); insert into sales values (date '2012-10-01', 1); insert into sales values (date '2012-10-02', 2); insert into sales values (date '2012-10-03', 3); insert into sales values (date '2012-10-04', 4); insert into sales values (date '2012-10-05', 5); insert into sales values (date '2012-10-06', 6); insert into sales values (date '2012-10-07', 7); commit; end; /
First business requirement is "for each day we want the sales of that day + the three preceding days" - no problem:
select to_char(day, 'YYYY-MM-DD') day , qty , sum(qty) over ( order by day range between interval '3' day preceding and current row ) four_days from sales order by day /
DAY QTY FOUR_DAYS ---------- ---------- ---------- 2010-10-01 1 1 2010-10-02 2 3 2010-10-03 3 6 2010-10-04 4 10 2010-10-05 5 14 2010-10-06 6 18 2010-10-07 7 22 2011-10-01 1 1 2011-10-02 2 3 2011-10-03 3 6 2011-10-04 4 10 2011-10-05 5 14 2011-10-06 6 18 2011-10-07 7 22 2012-10-01 1 1 2012-10-02 2 3 2012-10-03 3 6 2012-10-04 4 10 2012-10-05 5 14 2012-10-06 6 18 2012-10-07 7 22 21 rows selected.
Then is added another requirement "we also want the sales for the same 4 days of the previous year".
Well we might try this one then:
select to_char(day, 'YYYY-MM-DD') day , qty , sum(qty) over ( order by day range between interval '3' day preceding and current row ) four_days , sum(qty) over ( order by day range between interval '1' year + interval '3' day preceding and interval '1' year preceding ) four_lastyear from sales order by day /
range between interval '1' year + interval '3' day preceding * ERROR at line 10: ORA-30081: invalid data type for datetime/interval arithmetic
Now why won't that work? Hmmm... OK, we can try with days instead of years:
select to_char(day, 'YYYY-MM-DD') day , qty , sum(qty) over ( order by day range between interval '3' day preceding and current row ) four_days , sum(qty) over ( order by day range between interval '368' day(3) preceding and interval '365' day(3) preceding ) four_lastyear from sales order by day /
DAY QTY FOUR_DAYS FOUR_LASTYEAR ---------- ---------- ---------- ------------- 2010-10-01 1 1 2010-10-02 2 3 2010-10-03 3 6 2010-10-04 4 10 2010-10-05 5 14 2010-10-06 6 18 2010-10-07 7 22 2011-10-01 1 1 1 2011-10-02 2 3 3 2011-10-03 3 6 6 2011-10-04 4 10 10 2011-10-05 5 14 14 2011-10-06 6 18 18 2011-10-07 7 22 22 2012-10-01 1 1 3 2012-10-02 2 3 6 2012-10-03 3 6 10 2012-10-04 4 10 14 2012-10-05 5 14 18 2012-10-06 6 18 22 2012-10-07 7 22 18 21 rows selected.
Works for 2011 but not for 2012, because 2012 is a leap year. OK, we use a different set of days:
select to_char(day, 'YYYY-MM-DD') day , qty , sum(qty) over ( order by day range between interval '3' day preceding and current row ) four_days , sum(qty) over ( order by day range between interval '369' day(3) preceding and interval '366' day(3) preceding ) four_lastyear from sales order by day /
DAY QTY FOUR_DAYS FOUR_LASTYEAR ---------- ---------- ---------- ------------- 2010-10-01 1 1 2010-10-02 2 3 2010-10-03 3 6 2010-10-04 4 10 2010-10-05 5 14 2010-10-06 6 18 2010-10-07 7 22 2011-10-01 1 1 2011-10-02 2 3 1 2011-10-03 3 6 3 2011-10-04 4 10 6 2011-10-05 5 14 10 2011-10-06 6 18 14 2011-10-07 7 22 18 2012-10-01 1 1 1 2012-10-02 2 3 3 2012-10-03 3 6 6 2012-10-04 4 10 10 2012-10-05 5 14 14 2012-10-06 6 18 18 2012-10-07 7 22 22 21 rows selected.
Well, now 2012 works but not 2011. We need somehow to identify if the year has 366 or 365 days. For that purpose we can use the year interval:
select to_char( day , 'YYYY-MM-DD' ) day , to_char( day - interval '1' year , 'YYYY-MM-DD' ) day_lastyear , day - (day - interval '1' year) days from sales order by day /
DAY DAY_LASTYE DAYS ---------- ---------- ---------- 2010-10-01 2009-10-01 365 2010-10-02 2009-10-02 365 2010-10-03 2009-10-03 365 2010-10-04 2009-10-04 365 2010-10-05 2009-10-05 365 2010-10-06 2009-10-06 365 2010-10-07 2009-10-07 365 2011-10-01 2010-10-01 365 2011-10-02 2010-10-02 365 2011-10-03 2010-10-03 365 2011-10-04 2010-10-04 365 2011-10-05 2010-10-05 365 2011-10-06 2010-10-06 365 2011-10-07 2010-10-07 365 2012-10-01 2011-10-01 366 2012-10-02 2011-10-02 366 2012-10-03 2011-10-03 366 2012-10-04 2011-10-04 366 2012-10-05 2011-10-05 366 2012-10-06 2011-10-06 366 2012-10-07 2011-10-07 366 21 rows selected.
We saw that the correct range was 369-366 for leap years and 368-365 for non-leap years. So let's get data for the ranges 369, 368-366 and 365:
select to_char(day, 'YYYY-MM-DD') day , qty , sum(qty) over ( order by day range between interval '3' day preceding and current row ) four_days , sum(qty) over ( order by day range between interval '369' day(3) preceding and interval '369' day(3) preceding ) d1 , sum(qty) over ( order by day range between interval '368' day(3) preceding and interval '366' day(3) preceding ) d2 , sum(qty) over ( order by day range between interval '365' day(3) preceding and interval '365' day(3) preceding ) d3 from sales order by day /
DAY QTY FOUR_DAYS D1 D2 D3 ---------- ---------- ---------- ---------- ---------- ---------- 2010-10-01 1 1 2010-10-02 2 3 2010-10-03 3 6 2010-10-04 4 10 2010-10-05 5 14 2010-10-06 6 18 2010-10-07 7 22 2011-10-01 1 1 1 2011-10-02 2 3 1 2 2011-10-03 3 6 3 3 2011-10-04 4 10 6 4 2011-10-05 5 14 1 9 5 2011-10-06 6 18 2 12 6 2011-10-07 7 22 3 15 7 2012-10-01 1 1 1 2 2012-10-02 2 3 3 3 2012-10-03 3 6 6 4 2012-10-04 4 10 1 9 5 2012-10-05 5 14 2 12 6 2012-10-06 6 18 3 15 7 2012-10-07 7 22 4 18 21 rows selected.
And then we can calculate the correct sum:
select to_char(day, 'YYYY-MM-DD') day , qty , four_days , case day - (day - interval '1' year) when 366 then nvl(d1,0) + nvl(d2,0) else nvl(d2,0) + nvl(d3,0) end four_lastyear from ( select day , qty , sum(qty) over ( order by day range between interval '3' day preceding and current row ) four_days , sum(qty) over ( order by day range between interval '369' day(3) preceding and interval '369' day(3) preceding ) d1 , sum(qty) over ( order by day range between interval '368' day(3) preceding and interval '366' day(3) preceding ) d2 , sum(qty) over ( order by day range between interval '365' day(3) preceding and interval '365' day(3) preceding ) d3 from sales ) order by day /
DAY QTY FOUR_DAYS FOUR_LASTYEAR ---------- ---------- ---------- ------------- 2010-10-01 1 1 0 2010-10-02 2 3 0 2010-10-03 3 6 0 2010-10-04 4 10 0 2010-10-05 5 14 0 2010-10-06 6 18 0 2010-10-07 7 22 0 2011-10-01 1 1 1 2011-10-02 2 3 3 2011-10-03 3 6 6 2011-10-04 4 10 10 2011-10-05 5 14 14 2011-10-06 6 18 18 2011-10-07 7 22 22 2012-10-01 1 1 1 2012-10-02 2 3 3 2012-10-03 3 6 6 2012-10-04 4 10 10 2012-10-05 5 14 14 2012-10-06 6 18 18 2012-10-07 7 22 22 21 rows selected.
(I used the same numbers for all three years to make it easy to spot when four_lastyear is correct :-)
Let's just take a closer look at the interval calculation:
select to_char( day , 'YYYY-MM-DD' ) day , to_char( day - interval '1' year - interval '3' day , 'YYYY-MM-DD' ) day_lastthree from sales order by day /
DAY DAY_LASTTH ---------- ---------- 2010-10-01 2009-09-28 2010-10-02 2009-09-29 2010-10-03 2009-09-30 2010-10-04 2009-10-01 2010-10-05 2009-10-02 2010-10-06 2009-10-03 2010-10-07 2009-10-04 2011-10-01 2010-09-28 2011-10-02 2010-09-29 2011-10-03 2010-09-30 2011-10-04 2010-10-01 2011-10-05 2010-10-02 2011-10-06 2010-10-03 2011-10-07 2010-10-04 2012-10-01 2011-09-28 2012-10-02 2011-09-29 2012-10-03 2011-09-30 2012-10-04 2011-10-01 2012-10-05 2011-10-02 2012-10-06 2011-10-03 2012-10-07 2011-10-04 21 rows selected.
No problem. That calculation is first substracting an INTERVAL YEAR TO MONTH literal from a DATE column - the result will be a DATE. Then it subtracts an INTERVAL DAY TO SECOND literal from the resulting DATE which in turn returns yet another DATE. Perfect.
Now look at this one - strictly "mathematically" this should be equivalent:
select to_char( day , 'YYYY-MM-DD' ) day , to_char( day - (interval '1' year + interval '3' day) , 'YYYY-MM-DD' ) day_lastthree from sales order by day /
day - (interval '1' year + interval '3' day) * ERROR at line 6: ORA-30081: invalid data type for datetime/interval arithmetic
Now we fail - because we cannot add an INTERVAL DAY TO SECOND datatype to an INTERVAL YEAR TO MONTH.
Why not? Because an interval of one year or an interval of one month cannot be expressed as a certain number of days!
Subtract 1 year from a date in October 2012 will subtract 366 days, but subtracting 1 year from a date in October 2011 will subtract 365 days.
And so it is not possible to express "one year and 3 days" in a single interval expression - it needs two interval expressions of two different datatypes.
And therefore it cannot be used in a RANGE BETWEEN expression - we need to do the cryptic workaround shown above.
Comments
Post a Comment