Home > database >  SQL Find the daily maximum units from a table which stores transactions
SQL Find the daily maximum units from a table which stores transactions

Time:11-28

I have an SQL Table which stores the units (inventory) of items at any given timestamp. Any transaction(add/delete) on an item basically updates this table with the new quantity and the timestamp of occurrence.

update_timestamp  item_id  units
  1637993217       item1     3
  1637993227       item2     1
  1637993117       item1     2
  1637993237       item1     5

I need to fetch the daily maximum units for every item from this table.

The query I am using is something similar to this :

SELECT date_format(from_unixtime((CAST(update_timestamp AS BIGINT))/1000),'%Y-%m-%d') AS day,
item_id,
MAX(units) as max_units
from Table
group by item_id, day;

which gives an output like:

   day         item_id    max_units
2021-11-23      item1         5
2021-11-24      item1         6
2021-11-23      item2         3
....
....

However when generating the output, I also need to account for the units carrying forward from the balance of the transaction previous to my current day.
Example : For item1, there were few transactions on day 2021-11-24 and the quantity at the end of that day was 6. Now if the next transaction(s) on this item occurred only on 2021-11-26, and say were in the following sequence for this date : [ 4, 2, 3 ]. Then 6 should continue to be the maximum units of the item for the days 2021-11-25 and 2021-11-26 as well.

I am stuck here and unable to get it working through SQL. Currently how I am approaching this is by fetching the last transaction for every day separately, and then using python scripts to forward-fill this data for next days, which is not clean and scalable in my case.

I am running queries on Presto SQL engine.

CodePudding user response:

You can use lag window function to get previous value and select maximum between it and current one:

WITH dataset (update_timestamp, item_id, units) AS (
    VALUES (timestamp '2021-11-21 00:00:01', 'item1', 10),
        (timestamp '2021-11-23 00:00:02', 'item1', 6),
        (timestamp '2021-11-23 00:00:03', 'item2', 1),
        (timestamp '2021-11-24 00:00:01', 'item1', 2),
        (timestamp '2021-11-24 00:00:04', 'item1', 5)
)
SELECT item_id,
    day,
    coalesce( -- greatest will return NULL if one of the arguments is NULL so fallback to "current"
        greatest(
            max_units,
            lag(max_units) over (
                partition by item_id
                order by day
            )
        ),
        max_units
    ) as max_units
FROM (
        SELECT item_id,
            date_trunc('day', update_timestamp) day,
            max(units) as max_units
        FROM dataset
        GROUP BY item_id,
            date_trunc('day', update_timestamp)
    )

Output:

item_id day max_units
item2 2021-11-23 00:00:00.000 1
item1 2021-11-21 00:00:00.000 10
item1 2021-11-23 00:00:00.000 10
item1 2021-11-24 00:00:00.000 6

CodePudding user response:

I think my answer is really close to Guru's. I made an assumption that you might need to fill in dates that were missing, so created a calendar table - replace with whatever you want.

This was written in BigQuery, so not sure if it will compile/execute in Presto but I think they are syntactically close.

   with transactions as (
        select cast('2021-11-17' as date) as update_timestamp, 'item1' as item_id, 3 as units union all
        select cast('2021-11-18' as date), 'item2', 1 union all
        select cast('2021-11-18' as date), 'item2', 5 union all
        select cast('2021-11-20' as date), 'item1', 2 union all
        select cast('2021-11-20' as date), 'item2', 3 union all
        select cast('2021-11-20' as date), 'item2', 2 union all
        select cast('2021-11-20' as date), 'item1', 10 union all
        select cast('2021-11-24' as date), 'item1', 8 union all
        select cast('2021-11-24' as date), 'item1', 5
    ),
    
    some_calendar_table AS (
        SELECT cast(d as date) as cal_date
        FROM UNNEST(GENERATE_DATE_ARRAY('2021-11-15', '2021-11-30', INTERVAL 1 DAY)) AS d 
    ),

    daily_transaction_max as (
        SELECT update_timestamp AS transaction_date,
        item_id,
        MAX(units) as max_value
        from transactions
        group by item_id, transaction_date
    )


  select cal.cal_date
  , t.item_id
  , mt.max_value as max_inventory_from_this_dates_transactions
  , greatest(coalesce(mt.max_value, 0), coalesce(last_value(mt.max_value ignore nulls) over(partition by t.item_id 
        order by cal.cal_date 
        rows between unbounded preceding and 1 preceding)
  , 0)) as max_daily_inventory
  from some_calendar_table cal
  cross join (select distinct item_id from daily_transaction_max) t
  left join daily_transaction_max mt
    on mt.transaction_date = cal.cal_date
    and mt.item_id = t.item_id
  order by t.item_id, cal.cal_date
  • Related