How to roll up a Time-series like table in MySQL to get the most up date value that is not null?-CodePudding

I have a table of time-series data like data in SQL which follows the following format:

Note: Each customer_ID can have multiple transaction_IDs

Customer_ID	transaction_ID	Timestamp	Value1	Value2
1	1	01/01/2022 17:00:00	1	NULL
1	1	01/01/2022 17:05:00	NULL	Foo
1	1	01/01/2022 17:10:00	NULL	Bar
1	1	01/01/2022 17:15:00	2	NULL
1	2	01/01/2022 17:20:00	NULL	Wolf

I want to create a view based on this data which follows the following format:

Customer_ID	transaction_ID	Timestamp	Value1	Value2
1	1	01/01/2022 17:00:00	1	NULL
1	1	01/01/2022 17:05:00	1	Foo
1	1	01/01/2022 17:10:00	1	Bar
1	1	01/01/2022 17:15:00	2	Bar
1	2	01/01/2022 17:20:00	NULL	Wolf

Essentially I want to "roll" the data up so that the Value1 and Value2 are the most recent value for that transaction_id at a given timestamp.

I have tried things like PARTITION BY OVER statements but it this concatenates the values into a list instead of giving the most recent value(for strings) or their sum(for numeric values).

SELECT * FROM
(SELECT
transaction_id,
timestamp,
STRING_AGG(Value1) OVER(PARTITION BY transaction_id) AS Value1,
STRING_AGG(Value2) OVER(PARTITION BY transaction_id) AS Value2
FROM Database;

CodePudding user response：

Traditional way (use subqueries as column expressions):

select
  t1.Customer_Id
  , t1.transaction_id
  , t1.Timestamp
  , (select t2.value1
        from testable t2 
        where t2.Customer_Id=t1.Customer_Id
        and t2.transaction_Id=T1.transaction_Id
        and t2.TimeStamp=(select max(TimeStamp) as V1_TS
                         from testable t3
                         where t3.Customer_Id=t2.Customer_Id
                         and t3.transaction_Id=t2.transaction_Id
                         and t3.Value1 is not null
                         )
       ) LastVal1
  , (select t2.value2
        from testable t2 
        where t2.Customer_Id=t1.Customer_Id
        and t2.transaction_Id=T1.transaction_Id
        and t2.TimeStamp=(select max(TimeStamp) as V1_TS
                         from testable t3
                         where t3.Customer_Id=t2.Customer_Id
                         and t3.transaction_Id=t2.transaction_Id
                         and t3.Value2 is not null
                         )
       ) LastVal2
  from testable t1

CodePudding user response：

To create a common value each row shares with the row the substitute should come from, you can first use the windowed version of sum() calculating a cumulative sum. Use value1 IS NOT NULL (or value2 IS NOT NULL), which as a Boolean expression in numerical context will be implicitly casted to 0 or 1 in MySQL. (In other flavors of SQL (and also in MySQL itself) this can be done explicitly by using a CASE expression.)

Then you can use this common value to partition by and use the first_value() window function to get the substitute value which will be in the, be the order of the timestamps, first row of the rows sharing a common value, i.e. in the partition.

Something along the lines of:

SELECT customer_id,
       transaction_id,
       timestamp,
       first_value(value1) OVER (PARTITION BY transaction_id,
                                              cv1
                                 ORDER BY timestamp ASC) AS value1,
       first_value(value2) OVER (PARTITION BY transaction_id,
                                              cv2
                                 ORDER BY timestamp ASC) AS value2
       FROM (SELECT customer_id,
                    transaction_id,
                    timestamp,
                    value1,
                    value2,
                    sum(value1 IS NOT NULL) OVER (PARTITION BY transaction_id
                                                  ORDER BY timestamp ASC) AS cv1,
                    sum(value2 IS NOT NULL) OVER (PARTITION BY transaction_id
                                                  ORDER BY timestamp ASC) AS cv2
                    FROM elbat) x;

(Untested because you failed to deliver the example as consumable DDL and DML as you should have.)

CodePudding user response：

An old method (also works in MySql 7.x), is to use variables.

select Customer_ID, transaction_ID, Timestamp, Value1, Value2
from
(
  select Customer_ID
  , Timestamp
  , case 
    when (value1 is not null or transaction_ID != @transId) 
     and @val1 := value1
    then value1
    else @val1
    end as Value1
  , case 
    when (value2 is not null or transaction_ID != @transId) 
     and @val2 := value2
    then value2
    else @val2
    end as Value2
  , @transId := transaction_ID as transaction_ID
  from your_table
  cross join (select @transId:=0, @val1:=0, @val2:='') vals
  order by Customer_ID, transaction_ID, Timestamp
) q

Customer_ID	transaction_ID	Timestamp	Value1	Value2
1	1	2022-01-01 17:00:00	1	null
1	1	2022-01-01 17:05:00	1	Foo
1	1	2022-01-01 17:10:00	1	Bar
1	1	2022-01-01 17:15:00	2	Bar
1	2	2022-01-01 17:20:00	null	Wolf

Side-note, the sort order in the sub-query matters in this method.

Another method is to use correlated sub-queries

select Customer_ID, transaction_ID, Timestamp
, coalesce(Value1, (select t2.Value1 
   from your_table t2 
   where t2.transaction_ID = t.transaction_ID
   and t2.Value1 is not null
   and t2.Timestamp < t.Timestamp
   order by t2.Timestamp desc
   limit 1
  )) as Value1
, coalesce(Value2, (select t2.Value2
   from your_table t2 
   where t2.transaction_ID = t.transaction_ID
   and t2.Value2 is not null
   and t2.Timestamp < t.Timestamp
   order by t2.Timestamp desc
   limit 1
  )) as Value2
from your_table t
order by Customer_ID, transaction_ID, Timestamp;

Demo on db<>fiddle here