I have an SQL Table which stores the units (inventory) of items at any given timestamp. Any transaction(add/delete) on an item basically updates this table with the new quantity and the timestamp of occurrence.
update_timestamp item_id units
1637993217 item1 3
1637993227 item2 1
1637993117 item1 2
1637993237 item1 5
I need to fetch the daily maximum units for every item from this table.
The query I am using is something similar to this :
SELECT date_format(from_unixtime((CAST(update_timestamp AS BIGINT))/1000),'%Y-%m-%d') AS day,
item_id,
MAX(units) as max_units
from Table
group by item_id, day;
which gives an output like:
day item_id max_units
2021-11-23 item1 5
2021-11-24 item1 6
2021-11-23 item2 3
....
....
However when generating the output, I also need to account for the units carrying forward from the balance of the transaction previous to my current day.
Example : For item1, there were few transactions on day 2021-11-24
and the quantity at the end of that day was 6
. Now if the next transaction(s) on this item occurred only on 2021-11-26
, and say were in the following sequence for this date : [ 4, 2, 3 ]
. Then 6
should continue to be the maximum units of the item for the days 2021-11-25
and 2021-11-26
as well.
I am stuck here and unable to get it working through SQL. Currently how I am approaching this is by fetching the last transaction for every day separately, and then using python scripts to forward-fill this data for next days, which is not clean and scalable in my case.
I am running queries on Presto SQL engine.
CodePudding user response:
You can use lag
window function to get previous value and select maximum between it and current one:
WITH dataset (update_timestamp, item_id, units) AS (
VALUES (timestamp '2021-11-21 00:00:01', 'item1', 10),
(timestamp '2021-11-23 00:00:02', 'item1', 6),
(timestamp '2021-11-23 00:00:03', 'item2', 1),
(timestamp '2021-11-24 00:00:01', 'item1', 2),
(timestamp '2021-11-24 00:00:04', 'item1', 5)
)
SELECT item_id,
day,
coalesce( -- greatest will return NULL if one of the arguments is NULL so fallback to "current"
greatest(
max_units,
lag(max_units) over (
partition by item_id
order by day
)
),
max_units
) as max_units
FROM (
SELECT item_id,
date_trunc('day', update_timestamp) day,
max(units) as max_units
FROM dataset
GROUP BY item_id,
date_trunc('day', update_timestamp)
)
Output:
item_id | day | max_units |
---|---|---|
item2 | 2021-11-23 00:00:00.000 | 1 |
item1 | 2021-11-21 00:00:00.000 | 10 |
item1 | 2021-11-23 00:00:00.000 | 10 |
item1 | 2021-11-24 00:00:00.000 | 6 |
CodePudding user response:
I think my answer is really close to Guru's. I made an assumption that you might need to fill in dates that were missing, so created a calendar table - replace with whatever you want.
This was written in BigQuery, so not sure if it will compile/execute in Presto but I think they are syntactically close.
with transactions as (
select cast('2021-11-17' as date) as update_timestamp, 'item1' as item_id, 3 as units union all
select cast('2021-11-18' as date), 'item2', 1 union all
select cast('2021-11-18' as date), 'item2', 5 union all
select cast('2021-11-20' as date), 'item1', 2 union all
select cast('2021-11-20' as date), 'item2', 3 union all
select cast('2021-11-20' as date), 'item2', 2 union all
select cast('2021-11-20' as date), 'item1', 10 union all
select cast('2021-11-24' as date), 'item1', 8 union all
select cast('2021-11-24' as date), 'item1', 5
),
some_calendar_table AS (
SELECT cast(d as date) as cal_date
FROM UNNEST(GENERATE_DATE_ARRAY('2021-11-15', '2021-11-30', INTERVAL 1 DAY)) AS d
),
daily_transaction_max as (
SELECT update_timestamp AS transaction_date,
item_id,
MAX(units) as max_value
from transactions
group by item_id, transaction_date
)
select cal.cal_date
, t.item_id
, mt.max_value as max_inventory_from_this_dates_transactions
, greatest(coalesce(mt.max_value, 0), coalesce(last_value(mt.max_value ignore nulls) over(partition by t.item_id
order by cal.cal_date
rows between unbounded preceding and 1 preceding)
, 0)) as max_daily_inventory
from some_calendar_table cal
cross join (select distinct item_id from daily_transaction_max) t
left join daily_transaction_max mt
on mt.transaction_date = cal.cal_date
and mt.item_id = t.item_id
order by t.item_id, cal.cal_date