SQL Tuning Golf, anyone?
Golf is one of the very few sports where the goal is the lowest possible score.
Code golf is about solving a given algorithm using the fewest possible characters.
It can be very fun indeed to play code golf, also in SQL. But for real life it might be useful to aim at lowest possible score using a different measurement than number of characters...Recently Toon Koppelaars blogged a little SQL challenge on distributing arrivals over timeslots. Several interesting solutions are shown in the comment track on his blog post as well as on the accompanying Twitter thread.
Some of the solutions turned into a kind of code golf, trying to solve it with as short a piece of SQL as possible. Great fun 😁.
But then I thought, how about code golf with a different measure?
I contributed this solution to Toons challenge:
select /*+ gather_plan_statistics */ to_char(slottime, 'YYYY-MM-DD') as day , to_char(slottime, 'HH24:MI') || '-' || to_char(slottime + interval '5' minute, 'HH24:MI') as slot , article_id , sum(quantity) as quantity from ( select s.slottime , case when s.slotno <= a.slots then a.article_id else null end as article_id , case when s.slotno <= mod(a.quantity, a.slots) then trunc(a.quantity / a.slots) + 1 when s.slotno <= a.slots then trunc(a.quantity / a.slots) else null end as quantity from ( select arrival_starttime , round((arrival_stoptime - arrival_starttime) * (24*60/5)) as slots , nvl(round((lead(arrival_starttime) over ( partition by trunc(arrival_starttime) order by arrival_starttime ) - arrival_starttime) * (24*60/5)), 0) as slots_to_next , article_id , quantity from orderline_arrivals oa ) a cross apply ( select level as slotno , a.arrival_starttime + numtodsinterval(5 * (level-1), 'minute') as slottime from dual connect by level <= greatest(a.slots, a.slots_to_next) ) s ) group by slottime, article_id order by slottime, article_id;
It gives the desired output:
DAY SLOT ARTICLE_ID QUANTITY ---------- ----------- ---------- ---------- 2021-03-18 08:00-08:05 A1 4 2021-03-18 08:00-08:05 A2 7 2021-03-18 08:05-08:10 A1 3 2021-03-18 08:05-08:10 A2 7 2021-03-18 08:10-08:15 A1 6 2021-03-18 08:10-08:15 A2 6 2021-03-18 08:15-08:20 A1 2 2021-03-18 08:20-08:25 2021-03-18 08:25-08:30 2021-03-18 08:30-08:35 A3 25 2021-03-18 08:35-08:40 A3 25 11 rows selected.
When I run it using the /*+ gather_plan_statistics */ hint as above, I can query the access plan:
select * from dbms_xplan.display_cursor(format=>'ALLSTATS LAST');
And I get a plan with some statistics included:
PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID 2b9jgps3txryk, child number 0 ------------------------------------- select /*+ gather_plan_statistics */ to_char(slottime, 'YYYY-MM-DD') as day , to_char(slottime, 'HH24:MI') || '-' || to_char(slottime + interval '5' minute, 'HH24:MI') as slot , article_id , sum(quantity) as quantity from ( select s.slottime , case when s.slotno <= a.slots then a.article_id else null end as article_id , case when s.slotno <= mod(a.quantity, a.slots) then trunc(a.quantity / a.slots) + 1 when s.slotno <= a.slots then trunc(a.quantity / a.slots) else null end as quantity from ( select arrival_starttime , round((arrival_stoptime - arrival_starttime) * (24*60/5)) as slots , nvl(round((lead(arrival_starttime) over ( partition by trunc(arrival_starttime) order by arrival_starttime ) - arrival_starttime) * (24*60/5)), 0) as slots_to_next , article_id , quantity from orderline_arrivals oa ) Plan hash value: 3571017931 -------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | -------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 11 |00:00:00.01 | 7 | | | | | 1 | SORT ORDER BY | | 1 | 4 | 11 |00:00:00.01 | 7 | 2048 | 2048 | 2048 (0)| | 2 | HASH GROUP BY | | 1 | 4 | 11 |00:00:00.01 | 7 | 1088K| 1088K| 1063K (0)| | 3 | NESTED LOOPS | | 1 | 4 | 12 |00:00:00.01 | 7 | | | | | 4 | VIEW | | 1 | 4 | 4 |00:00:00.01 | 7 | | | | | 5 | WINDOW SORT | | 1 | 4 | 4 |00:00:00.01 | 7 | 2048 | 2048 | 2048 (0)| | 6 | TABLE ACCESS FULL | ORDERLINE_ARRIVALS | 1 | 4 | 4 |00:00:00.01 | 7 | | | | | 7 | VIEW | VW_LAT_3CA82F67 | 4 | 1 | 12 |00:00:00.01 | 0 | | | | | 8 | CONNECT BY WITHOUT FILTERING| | 4 | | 12 |00:00:00.01 | 0 | 2048 | 2048 | 2048 (0)| | 9 | FAST DUAL | | 4 | 1 | 4 |00:00:00.01 | 0 | | | | -------------------------------------------------------------------------------------------------------------------------------------------- Note ----- - dynamic statistics used: dynamic sampling (level=2)
When I'm tuning a SQL statement, basically I want it to spend as little time as possible, using as few resources as possible. Or in other words, get lowest possible values of A-Time, Buffers, *Mem columns. Often reduction of number of operations in the plan is helpful to achieve this goal.
Sometimes that's a compromise, of course, as you might need to expend more Mem to get a low A-Time, or more A-Time to get low Buffers, or more operations but each doing much less work, etc. It'll depend on actual use case which measure is most important to reduce.
So isn't SQL tuning really just a different kind of code golf?
Let's introduce a new game:
SQL Tuning Golf
With some rules along these lines:
- Each game defines:
- The task/algorithm to be solved.
- Representative data to be used.
- The correct output for the given set of data.
- Which plan statistics measure is to be minimized.
- Each solution must show:
- That it produces the correct output for the given data.
- A plan statistics output for measuring the value of the attempted minimization.
- Of the solutions, the winner is the one with the minimum value of the desired measure.
So for example Toons challenge in the above blogpost might have been issued as a SQL Tuning Golf game asking for minimization of Buffers measure.
There might be additional rules that needs to be introduced for specific SQL Tuning Golf games, for example whether or not additional object creation is allowed (like indexes) or changing table definition (like partitioning). And if multi-step solutions are allowed (like creating materialized view or temporary table and querying those) and how the total measure then is calculated (like sum of Buffers used by mview creation plus query.)
Potentially that could become very complicated with huge arguments, I realize that. So I think game creators should try (at least at first) to keep it simple. KISS is a good principle, also in game creation 😉.
Anyway, I think I'll try and think of a few SQL Tuning Golf games and put 'em out there (probably on LiveSQL or maybe it can be introduced on the Oracle DevGym.)
Comments
Post a Comment