Home > Enterprise >  How to roll up a Time-series like table in MySQL to get the most up date value that is not null?
How to roll up a Time-series like table in MySQL to get the most up date value that is not null?

Time:01-04

I have a table of time-series data like data in SQL which follows the following format:

Note: Each customer_ID can have multiple transaction_IDs

Customer_ID transaction_ID Timestamp Value1 Value2
1 1 01/01/2022 17:00:00 1 NULL
1 1 01/01/2022 17:05:00 NULL Foo
1 1 01/01/2022 17:10:00 NULL Bar
1 1 01/01/2022 17:15:00 2 NULL
1 2 01/01/2022 17:20:00 NULL Wolf

I want to create a view based on this data which follows the following format:

Customer_ID transaction_ID Timestamp Value1 Value2
1 1 01/01/2022 17:00:00 1 NULL
1 1 01/01/2022 17:05:00 1 Foo
1 1 01/01/2022 17:10:00 1 Bar
1 1 01/01/2022 17:15:00 2 Bar
1 2 01/01/2022 17:20:00 NULL Wolf

Essentially I want to "roll" the data up so that the Value1 and Value2 are the most recent value for that transaction_id at a given timestamp.

I have tried things like PARTITION BY OVER statements but it this concatenates the values into a list instead of giving the most recent value(for strings) or their sum(for numeric values).

SELECT * FROM
(SELECT
transaction_id,
timestamp,
STRING_AGG(Value1) OVER(PARTITION BY transaction_id) AS Value1,
STRING_AGG(Value2) OVER(PARTITION BY transaction_id) AS Value2
FROM Database;

CodePudding user response:

Traditional way (use subqueries as column expressions):

select
  t1.Customer_Id
  , t1.transaction_id
  , t1.Timestamp
  , (select t2.value1
        from testable t2 
        where t2.Customer_Id=t1.Customer_Id
        and t2.transaction_Id=T1.transaction_Id
        and t2.TimeStamp=(select max(TimeStamp) as V1_TS
                         from testable t3
                         where t3.Customer_Id=t2.Customer_Id
                         and t3.transaction_Id=t2.transaction_Id
                         and t3.Value1 is not null
                         )
       ) LastVal1
  , (select t2.value2
        from testable t2 
        where t2.Customer_Id=t1.Customer_Id
        and t2.transaction_Id=T1.transaction_Id
        and t2.TimeStamp=(select max(TimeStamp) as V1_TS
                         from testable t3
                         where t3.Customer_Id=t2.Customer_Id
                         and t3.transaction_Id=t2.transaction_Id
                         and t3.Value2 is not null
                         )
       ) LastVal2
  from testable t1

CodePudding user response:

To create a common value each row shares with the row the substitute should come from, you can first use the windowed version of sum() calculating a cumulative sum. Use value1 IS NOT NULL (or value2 IS NOT NULL), which as a Boolean expression in numerical context will be implicitly casted to 0 or 1 in MySQL. (In other flavors of SQL (and also in MySQL itself) this can be done explicitly by using a CASE expression.)

Then you can use this common value to partition by and use the first_value() window function to get the substitute value which will be in the, be the order of the timestamps, first row of the rows sharing a common value, i.e. in the partition.

Something along the lines of:

SELECT customer_id,
       transaction_id,
       timestamp,
       first_value(value1) OVER (PARTITION BY transaction_id,
                                              cv1
                                 ORDER BY timestamp ASC) AS value1,
       first_value(value2) OVER (PARTITION BY transaction_id,
                                              cv2
                                 ORDER BY timestamp ASC) AS value2
       FROM (SELECT customer_id,
                    transaction_id,
                    timestamp,
                    value1,
                    value2,
                    sum(value1 IS NOT NULL) OVER (PARTITION BY transaction_id
                                                  ORDER BY timestamp ASC) AS cv1,
                    sum(value2 IS NOT NULL) OVER (PARTITION BY transaction_id
                                                  ORDER BY timestamp ASC) AS cv2
                    FROM elbat) x;

(Untested because you failed to deliver the example as consumable DDL and DML as you should have.)

CodePudding user response:

An old method (also works in MySql 7.x), is to use variables.

select Customer_ID, transaction_ID, Timestamp, Value1, Value2
from
(
  select Customer_ID
  , Timestamp
  , case 
    when (value1 is not null or transaction_ID != @transId) 
     and @val1 := value1
    then value1
    else @val1
    end as Value1
  , case 
    when (value2 is not null or transaction_ID != @transId) 
     and @val2 := value2
    then value2
    else @val2
    end as Value2
  , @transId := transaction_ID as transaction_ID
  from your_table
  cross join (select @transId:=0, @val1:=0, @val2:='') vals
  order by Customer_ID, transaction_ID, Timestamp
) q
Customer_ID transaction_ID Timestamp Value1 Value2
1 1 2022-01-01 17:00:00 1 null
1 1 2022-01-01 17:05:00 1 Foo
1 1 2022-01-01 17:10:00 1 Bar
1 1 2022-01-01 17:15:00 2 Bar
1 2 2022-01-01 17:20:00 null Wolf

Side-note, the sort order in the sub-query matters in this method.

Another method is to use correlated sub-queries

select Customer_ID, transaction_ID, Timestamp
, coalesce(Value1, (select t2.Value1 
   from your_table t2 
   where t2.transaction_ID = t.transaction_ID
   and t2.Value1 is not null
   and t2.Timestamp < t.Timestamp
   order by t2.Timestamp desc
   limit 1
  )) as Value1
, coalesce(Value2, (select t2.Value2
   from your_table t2 
   where t2.transaction_ID = t.transaction_ID
   and t2.Value2 is not null
   and t2.Timestamp < t.Timestamp
   order by t2.Timestamp desc
   limit 1
  )) as Value2
from your_table t
order by Customer_ID, transaction_ID, Timestamp;

Demo on db<>fiddle here

  •  Tags:  
  • Related