Home > Mobile >  Transforming a log file that tracks changes with timestamps to a valid-from valid-to type table
Transforming a log file that tracks changes with timestamps to a valid-from valid-to type table

Time:01-03

I have a kind of log table that I would like to transform using SQL Server - adding a validity interval (valid-from, valid-to fields) to records.

The source table is rather large (100K records), so an optimal solution would be required. It is acutally a table that tracks content (status) changes in a field for individual records.

Illustration of the source table:

RecordId DateStamp OldValue Newvalue
1 2021.01.01. A Value Another Value
1 2021.02.01 Another Value Yet Another Value
1 2021.03.01 Yet Another Value A Value
1 2021.04.01 A Value Yet Another Value
4 2021.01.01 B Value Next Value
4 2021.02.01 Next Value Second Next Value
4 2021.03.01 Second Next Value Final Value

RecordId refers to the record on which the field change has been made. OldValue and NewValue refers to the value of the field before and after the change has been made.

Changes are consecutive. RecordId and OldValue and NewValue fields are arbitrary.

Value fields can switch back and forth between statuses.

Illustration of the desired output table with date slicing - so a querying for a status on a given date is possible using the validity dates:

RecordId Value ValidFromDate ValidToDate
1 A Value 1900.01.01. (or null) 2020.12.31.
1 Another Value 2021.01.01. 2021.01.31.
1 Yet Another Value 2022.02.01 2022.02.28.
1 A Value 2022.03.01 2022.03.31.
1 Yet Another Value 2022.04.01 9999.12.31 (or null)
4 B Value 1900.01.01 (or null) 2021.12.31
4 Next Value 2022.01.01 2022.01.31.
4 Second Next Value 2022.02.01 2022.02.28.
4 Final 2022.03.01 9999.12.31 (or null)

Is there an efficient way to do this?

CodePudding user response:

This code may not be the most efficient or scalable but it works - the part for getting the 'initial' values is what I would be concerned about so other people might have a better suggestion.

The initial query reads data from the table, and it uses the LEAD function to return the date from the 'next' row and then subtract 1 day using the DATEADD to calculate the ValidToDate. I've wrapped it in a CASE statement so that it doesn't read the date for any other RecordID.

The union gets the data for the rows where the ValidFromDate is NULL. It self joins to select only the MIN(DateStamp) and uses DATEADD to subtract a single day for the ValidToDate calculation.

SELECT L.RecordId,
       L.NewValue AS [Value],
       L.DateStamp AS ValidFromDate,
       CASE WHEN LEAD(L.RecordId, 1) OVER (ORDER BY L.RecordId, L.DateStamp) = L.RecordId
           THEN DATEADD(DD, -1, LEAD(L.DateStamp, 1) OVER (ORDER BY L.RecordId, L.DateStamp))
           ELSE NULL
       END AS ValidToDate
FROM #Log AS L
UNION ALL
SELECT L.RecordId,
       L.OldValue,
       NULL AS ValidFromDate,
       DATEADD(DD, -1, MinL.DateStamp) AS ValidToDate
FROM #Log AS L
     INNER JOIN (SELECT L.RecordId, MIN(L.DateStamp) AS DateStamp FROM #Log AS L GROUP BY L.RecordId) AS MinL ON MinL.DateStamp = L.DateStamp AND MinL.RecordId = L.RecordId
ORDER BY L.RecordId,
         L.DateStamp;

CodePudding user response:

One method would be to unpivot the first row, to get 2 rows for the "old" and "new" value, and then you can use LEAD to get the valid to date. You can determine the first row, by using ROW_NUMBER in a CTE (or derived table), and use the WHERE to only return the 1st row, or the "new" rows:

WITH CTE AS(
    SELECT RecordId,
           DateStamp,
           OldValue,
           NewValue,
           ROW_NUMBER() OVER (PARTITION BY RecordID ORDER BY DateStamp) AS RN
    FROM (VALUES(1,CONVERT(date,'20210101'),'A Value ','Another Value'),
                (1,CONVERT(date,'20210201'),'Another Value ','Yet Another Value'),
                (1,CONVERT(date,'20210301'),'Yet Another Value ','A Value'),
                (1,CONVERT(date,'20210401'),'A Value ','Yet Another Value'),
                (4,CONVERT(date,'20210101'),'B Value ','Next Value'),
                (4,CONVERT(date,'20210201'),'Next Value ','Second Next Value'),
                (4,CONVERT(date,'20210301'),'Second Next Value ','Final Value'))V(RecordId,DateStamp,OldValue,NewValue))
SELECT C.RecordId,
       V.Value AS Value,
       CASE V.ValueType WHEN 'New' THEN C.DateStamp END AS ValidFromDate,
       CASE V.ValueType WHEN 'New' THEN LEAD(DATEADD(DAY, -1, C.DateStamp)) OVER (PARTITION BY C.RecordID ORDER BY C.DateStamp) ELSE  DATEADD(DAY, -1, C.DateStamp) END AS ValidToDate
FROM CTE C
     CROSS APPLY (VALUES(C.OldValue, 'Old'),(C.NewValue, 'New'))V(Value,ValueType)
WHERE RN = 1
   OR ValueType = 'New';
  • Related