Temporal validity and open/closed intervals
A short while ago I created a SQL quiz for Oracle Dev Gym (PL/SQL Challenge) demonstrating the use of temporal validity and VERSIONS PERIOD FOR syntax to create a "change report" for changes in item prices. Use of temporal validity makes this easier, shorter and more readable code, but even though I thought I had it all covered, player Iudith Mentzel pointed out a tiny quirk I'd overlooked.
You see, the "change in price" wasn't really happening at one specific point in time - the old price was valid up to (but not including) a point in time (open-ended interval) and the new price became valid from (and including) that point in time (close-ended interval). If my "change report" happened to match those "ends" of the validity intervals exactly, it would never catch the "change" happening.
All of the SQL shown here can be found in the demo script here.
So for this demonstration I'll have a price table with a temporal validity period valid_price defined on it. I'll assume in this case each item will always have exactly one valid price at a given point in time - that is, that the periods for one item never overlaps and never has gaps in them.
VALID_FROM is included in the valid_price period interval (closed interval), VALID_UP_TO is excluded from valid_price period interval (open interval), and NULL in either date column means "from/to infinity".
So let's just see an overview of the data:
With AS OF we can view valid prices at a specific point in time:
Typically used in a query for "currently valid price":
Note that VALID_UP_TO being "open-ended interval" means the Santabead price 100 (that has VALID_UP_TO value of 2016-12-11 00:00:00) is valid at 1 second before midnight:
But at midnight, the price 100 is no longer valid, instead the price is 75:
While AS OF gives valid price at specific point in time, VERSIONS BETWEEN gives all prices that are valid within a time period:
We can use analytic LEAD function to have the VALID_UP_TO and PRICE of the "old" row and the VALID_FROM and PRICE of the "new" row together
Which can lead us to a query showing "price changes in December":
All good - we have a "change report" query. But now let's look at valid prices in November instead of December:
And then we try the "change report" for November:
OK, we've now shown the change report for the entirety of November, which reports a single price change for Beachball. Before we had the change report for the entirety of December, which reports no price changes for Beachball.
But aren't we missing something? How about if we make a change report for November+December together?
This time we spot that there was a change for Beachball from a price of 12 valid up to (not including) 2016-12-01 to a price of 15 valid from 2016-12-01. Even though our two previous change reports covered the same time period, neither reported this change.
The "problem" is the price interval "ends" match completely the intervals we are using in our November and December price change report, thus we won't get "old" and "new" row in the same set of data.
If we want such a "price change" report, we'll have to use a "closed" interval for the month we are reporting for, so that when we do next months report, we'll actually have a one second "overlap" with the report from this month:
This isn't demonstrating any problem with temporal validity. The interval definitions are as they have to be and temporal validity is simply an easier way to deal with these things than doing it yourself with predicates involving proper combinations of AND, OR, >=, < and IS NULL.
But no matter if you do it yourself or you do it with temporal validity, you should beware of how "closed" and "open" intervals work and why the "semi-closed" intervals in temporal validity sometimes can fool you, like missing that I'd actually need "overlap" when doing an otherwise seemingly straight-forward "price change report".
Thank $deity for sharp Dev Gym players - I can always count on them to discover the little quirks I miss myself.
You see, the "change in price" wasn't really happening at one specific point in time - the old price was valid up to (but not including) a point in time (open-ended interval) and the new price became valid from (and including) that point in time (close-ended interval). If my "change report" happened to match those "ends" of the validity intervals exactly, it would never catch the "change" happening.
All of the SQL shown here can be found in the demo script here.
So for this demonstration I'll have a price table with a temporal validity period valid_price defined on it. I'll assume in this case each item will always have exactly one valid price at a given point in time - that is, that the periods for one item never overlaps and never has gaps in them.
VALID_FROM is included in the valid_price period interval (closed interval), VALID_UP_TO is excluded from valid_price period interval (open interval), and NULL in either date column means "from/to infinity".
create table prices ( item varchar2(10) not null , valid_from date , valid_up_to date , price number not null , period for valid_price (valid_from, valid_up_to) ) / insert into prices values ('Santabeard', null , date '2016-12-11', 100) / insert into prices values ('Santabeard', date '2016-12-11', date '2016-12-27', 75) / insert into prices values ('Santabeard', date '2016-12-27', null , 105) / insert into prices values ('Beachball' , null , date '2016-11-10', 14) / insert into prices values ('Beachball' , date '2016-11-10', date '2016-12-01', 12) / insert into prices values ('Beachball' , date '2016-12-01', null , 15) / insert into prices values ('Chessgame' , null , date '2016-12-16', 23) / insert into prices values ('Chessgame' , date '2016-12-16', null , 21) / commit /
So let's just see an overview of the data:
alter session set nls_date_format = 'YYYY-MM-DD' / select * from prices order by item, valid_from nulls first /
ITEM VALID_FROM VALID_UP_T PRICE ---------- ---------- ---------- ---------- Beachball 2016-11-10 14 Beachball 2016-11-10 2016-12-01 12 Beachball 2016-12-01 15 Chessgame 2016-12-16 23 Chessgame 2016-12-16 21 Santabeard 2016-12-11 100 Santabeard 2016-12-11 2016-12-27 75 Santabeard 2016-12-27 105
With AS OF we can view valid prices at a specific point in time:
select * from prices as of period for valid_price date '2016-12-12' order by item, valid_from nulls first /
ITEM VALID_FROM VALID_UP_T PRICE ---------- ---------- ---------- ---------- Beachball 2016-12-01 15 Chessgame 2016-12-16 23 Santabeard 2016-12-11 2016-12-27 75
Typically used in a query for "currently valid price":
select * from prices as of period for valid_price sysdate order by item, valid_from nulls first /
ITEM VALID_FROM VALID_UP_T PRICE ---------- ---------- ---------- ---------- Beachball 2016-12-01 15 Chessgame 2016-12-16 21 Santabeard 2016-12-27 105
Note that VALID_UP_TO being "open-ended interval" means the Santabead price 100 (that has VALID_UP_TO value of 2016-12-11 00:00:00) is valid at 1 second before midnight:
select * from prices as of period for valid_price to_date('2016-12-10 23:59:59', 'YYYY-MM-DD HH24:MI:SS') where item = 'Santabeard' order by item, valid_from nulls first /
ITEM VALID_FROM VALID_UP_T PRICE ---------- ---------- ---------- ---------- Santabeard 2016-12-11 100
But at midnight, the price 100 is no longer valid, instead the price is 75:
select * from prices as of period for valid_price to_date('2016-12-11 00:00:00', 'YYYY-MM-DD HH24:MI:SS') where item = 'Santabeard' order by item, valid_from nulls first /
ITEM VALID_FROM VALID_UP_T PRICE ---------- ---------- ---------- ---------- Santabeard 2016-12-11 2016-12-27 75
While AS OF gives valid price at specific point in time, VERSIONS BETWEEN gives all prices that are valid within a time period:
select * from prices versions period for valid_price between to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS') order by item, valid_from nulls first, valid_up_to nulls last /
ITEM VALID_FROM VALID_UP_T PRICE ---------- ---------- ---------- ---------- Beachball 2016-12-01 15 Chessgame 2016-12-16 23 Chessgame 2016-12-16 21 Santabeard 2016-12-11 100 Santabeard 2016-12-11 2016-12-27 75 Santabeard 2016-12-27 105
We can use analytic LEAD function to have the VALID_UP_TO and PRICE of the "old" row and the VALID_FROM and PRICE of the "new" row together
select item , valid_up_to as old_valid_up_to , price as old_price , lead(valid_from) over ( partition by item order by valid_up_to ) as new_valid_from , lead(price) over ( partition by item order by valid_up_to ) as new_price from prices versions period for valid_price between to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS') order by item, valid_from nulls first, valid_up_to nulls last /
ITEM OLD_VALID_ OLD_PRICE NEW_VALID_ NEW_PRICE ---------- ---------- ---------- ---------- ---------- Beachball 15 Chessgame 2016-12-16 23 2016-12-16 21 Chessgame 21 Santabeard 2016-12-11 100 2016-12-11 75 Santabeard 2016-12-27 75 2016-12-27 105 Santabeard 105
Which can lead us to a query showing "price changes in December":
select item, old_valid_up_to, old_price, new_valid_from, new_price from ( select item , valid_up_to as old_valid_up_to , price as old_price , lead(valid_from) over ( partition by item order by valid_up_to ) as new_valid_from , lead(price) over ( partition by item order by valid_up_to ) as new_price from prices versions period for valid_price between to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS') ) where new_price is not null order by item, old_valid_up_to /
ITEM OLD_VALID_ OLD_PRICE NEW_VALID_ NEW_PRICE ---------- ---------- ---------- ---------- ---------- Chessgame 2016-12-16 23 2016-12-16 21 Santabeard 2016-12-11 100 2016-12-11 75 Santabeard 2016-12-27 75 2016-12-27 105
All good - we have a "change report" query. But now let's look at valid prices in November instead of December:
select * from prices versions period for valid_price between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and to_date('2016-11-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS') order by item, valid_from nulls first, valid_up_to nulls last /
ITEM VALID_FROM VALID_UP_T PRICE ---------- ---------- ---------- ---------- Beachball 2016-11-10 14 Beachball 2016-11-10 2016-12-01 12 Chessgame 2016-12-16 23 Santabeard 2016-12-11 100
And then we try the "change report" for November:
select item, old_valid_up_to, old_price, new_valid_from, new_price from ( select item , valid_up_to as old_valid_up_to , price as old_price , lead(valid_from) over ( partition by item order by valid_up_to ) as new_valid_from , lead(price) over ( partition by item order by valid_up_to ) as new_price from prices versions period for valid_price between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and to_date('2016-11-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS') ) where new_price is not null order by item, old_valid_up_to /
ITEM OLD_VALID_ OLD_PRICE NEW_VALID_ NEW_PRICE ---------- ---------- ---------- ---------- ---------- Beachball 2016-11-10 14 2016-11-10 12
OK, we've now shown the change report for the entirety of November, which reports a single price change for Beachball. Before we had the change report for the entirety of December, which reports no price changes for Beachball.
But aren't we missing something? How about if we make a change report for November+December together?
select item, old_valid_up_to, old_price, new_valid_from, new_price from ( select item , valid_up_to as old_valid_up_to , price as old_price , lead(valid_from) over ( partition by item order by valid_up_to ) as new_valid_from , lead(price) over ( partition by item order by valid_up_to ) as new_price from prices versions period for valid_price between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS') ) where new_price is not null order by item, old_valid_up_to /
ITEM OLD_VALID_ OLD_PRICE NEW_VALID_ NEW_PRICE ---------- ---------- ---------- ---------- ---------- Beachball 2016-11-10 14 2016-11-10 12 Beachball 2016-12-01 12 2016-12-01 15 Chessgame 2016-12-16 23 2016-12-16 21 Santabeard 2016-12-11 100 2016-12-11 75 Santabeard 2016-12-27 75 2016-12-27 105
This time we spot that there was a change for Beachball from a price of 12 valid up to (not including) 2016-12-01 to a price of 15 valid from 2016-12-01. Even though our two previous change reports covered the same time period, neither reported this change.
The "problem" is the price interval "ends" match completely the intervals we are using in our November and December price change report, thus we won't get "old" and "new" row in the same set of data.
If we want such a "price change" report, we'll have to use a "closed" interval for the month we are reporting for, so that when we do next months report, we'll actually have a one second "overlap" with the report from this month:
select item, old_valid_up_to, old_price, new_valid_from, new_price from ( select item , valid_up_to as old_valid_up_to , price as old_price , lead(valid_from) over ( partition by item order by valid_up_to ) as new_valid_from , lead(price) over ( partition by item order by valid_up_to ) as new_price from prices versions period for valid_price between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') ) where new_price is not null order by item, old_valid_up_to /
ITEM OLD_VALID_ OLD_PRICE NEW_VALID_ NEW_PRICE ---------- ---------- ---------- ---------- ---------- Beachball 2016-11-10 14 2016-11-10 12 Beachball 2016-12-01 12 2016-12-01 15
This isn't demonstrating any problem with temporal validity. The interval definitions are as they have to be and temporal validity is simply an easier way to deal with these things than doing it yourself with predicates involving proper combinations of AND, OR, >=, < and IS NULL.
But no matter if you do it yourself or you do it with temporal validity, you should beware of how "closed" and "open" intervals work and why the "semi-closed" intervals in temporal validity sometimes can fool you, like missing that I'd actually need "overlap" when doing an otherwise seemingly straight-forward "price change report".
Thank $deity for sharp Dev Gym players - I can always count on them to discover the little quirks I miss myself.
Hi Kim,
ReplyDeleteBefore, there was a change that you didn't report in either month. Now that change is reported in both months.
Personally, as a user I would want the change reported once, in this case in December when the new price became effective.
select * from (
select ITEM, VALID_UP_TO price_change_date, PRICE old_price,
lead(price) over(partition by item order by valid_from nulls first) new_price
from prices
where (valid_up_to is null or valid_up_to >= date '2016-12-01')
and (valid_from is null or valid_from < add_months(date '2016-12-01',1))
)
where new_price is not null
order by item, price_change_date;
Of course, this doesn't use temporal validity syntax anymore, which is unfortunate...
Best regards, Stew
Hi, Stew
DeleteI figured it was just a matter of time before you would comment - you were just faster than I thought ;-)
Anyway - no, my little "1 second overlap" does not report change in both months, only one. I do my November change report as shown in the blog post with:
from prices versions period for valid_price
between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
That shows 2 Beachball price changes - one in the month, one at the month switch.
Then I do my December change report with:
from prices versions period for valid_price
between to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and to_date('2017-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
And it does not show any beachball price changes.
You're right, though, that from a user perspective, it would probably be better to have my one-second overlap in the other direction, so it shows in December change report instead.
from prices versions period for valid_price
between date '2016-12-01' - interval '1' second
and date '2016-12-01' + interval '1' month - interval '1' second
That'll also do the trick and get me same results as your query.
For this use case, I think your query is "neater". Having to do this "trick" of a one-second overlap is kind of unfortunate for temporal validity, but I don't see a way that it could be different.
Main point is to beware of this, as temporal validity syntax makes it easy to "fall into the trap" that I did ;-)
Thanks for the comment
/Kim
Hi Kim & Stew,
ReplyDeleteThanks a lot for your thorough analysis :)
The problem arises from the usual speaking (and Oracle !) semantics of the word BETWEEN,
which means an interval closed at both edges.
Thus, the combination of BETWEEN with the temporal validity semi-closed interval logic,
like in the VERSIONS PERIOD FOR ... BETWEEN query is the cause of the "missing edge"
effect that we see.
Temporal validity seems "more naturally suited" for the one-point-in-time
AS OF PERIOD FOR queries than for the BETWEEN closed interval logic.
It is always easier "to philosophize" after, than while playing a quiz in a competition :)
And, just like Kim, I myself should thank the "$deity" once again
for having given us the treasure of Kim's quizzes and web posts :) :) :)
Cheers & Best Regards,
Iudith