I have a kind of log table that I would like to transform using SQL Server - adding a validity interval (valid-from, valid-to fields) to records.
The source table is rather large (100K records), so an optimal solution would be required. It is acutally a table that tracks content (status) changes in a field for individual records.
Illustration of the source table:
RecordId | DateStamp | OldValue | Newvalue |
---|---|---|---|
1 | 2021.01.01. | A Value | Another Value |
1 | 2021.02.01 | Another Value | Yet Another Value |
1 | 2021.03.01 | Yet Another Value | A Value |
1 | 2021.04.01 | A Value | Yet Another Value |
4 | 2021.01.01 | B Value | Next Value |
4 | 2021.02.01 | Next Value | Second Next Value |
4 | 2021.03.01 | Second Next Value | Final Value |
RecordId
refers to the record on which the field change has been made. OldValue
and NewValue
refers to the value of the field before and after the change has been made.
Changes are consecutive. RecordId
and OldValue
and NewValue
fields are arbitrary.
Value
fields can switch back and forth between statuses.
Illustration of the desired output table with date slicing - so a querying for a status on a given date is possible using the validity dates:
RecordId | Value | ValidFromDate | ValidToDate |
---|---|---|---|
1 | A Value | 1900.01.01. (or null) | 2020.12.31. |
1 | Another Value | 2021.01.01. | 2021.01.31. |
1 | Yet Another Value | 2022.02.01 | 2022.02.28. |
1 | A Value | 2022.03.01 | 2022.03.31. |
1 | Yet Another Value | 2022.04.01 | 9999.12.31 (or null) |
4 | B Value | 1900.01.01 (or null) | 2021.12.31 |
4 | Next Value | 2022.01.01 | 2022.01.31. |
4 | Second Next Value | 2022.02.01 | 2022.02.28. |
4 | Final | 2022.03.01 | 9999.12.31 (or null) |
Is there an efficient way to do this?
CodePudding user response:
This code may not be the most efficient or scalable but it works - the part for getting the 'initial' values is what I would be concerned about so other people might have a better suggestion.
The initial query reads data from the table, and it uses the LEAD function to return the date from the 'next' row and then subtract 1 day using the DATEADD to calculate the ValidToDate. I've wrapped it in a CASE statement so that it doesn't read the date for any other RecordID.
The union gets the data for the rows where the ValidFromDate is NULL. It self joins to select only the MIN(DateStamp) and uses DATEADD to subtract a single day for the ValidToDate calculation.
SELECT L.RecordId,
L.NewValue AS [Value],
L.DateStamp AS ValidFromDate,
CASE WHEN LEAD(L.RecordId, 1) OVER (ORDER BY L.RecordId, L.DateStamp) = L.RecordId
THEN DATEADD(DD, -1, LEAD(L.DateStamp, 1) OVER (ORDER BY L.RecordId, L.DateStamp))
ELSE NULL
END AS ValidToDate
FROM #Log AS L
UNION ALL
SELECT L.RecordId,
L.OldValue,
NULL AS ValidFromDate,
DATEADD(DD, -1, MinL.DateStamp) AS ValidToDate
FROM #Log AS L
INNER JOIN (SELECT L.RecordId, MIN(L.DateStamp) AS DateStamp FROM #Log AS L GROUP BY L.RecordId) AS MinL ON MinL.DateStamp = L.DateStamp AND MinL.RecordId = L.RecordId
ORDER BY L.RecordId,
L.DateStamp;
CodePudding user response:
One method would be to unpivot the first row, to get 2 rows for the "old" and "new" value, and then you can use LEAD
to get the valid to date. You can determine the first row, by using ROW_NUMBER
in a CTE (or derived table), and use the WHERE
to only return the 1st row, or the "new" rows:
WITH CTE AS(
SELECT RecordId,
DateStamp,
OldValue,
NewValue,
ROW_NUMBER() OVER (PARTITION BY RecordID ORDER BY DateStamp) AS RN
FROM (VALUES(1,CONVERT(date,'20210101'),'A Value ','Another Value'),
(1,CONVERT(date,'20210201'),'Another Value ','Yet Another Value'),
(1,CONVERT(date,'20210301'),'Yet Another Value ','A Value'),
(1,CONVERT(date,'20210401'),'A Value ','Yet Another Value'),
(4,CONVERT(date,'20210101'),'B Value ','Next Value'),
(4,CONVERT(date,'20210201'),'Next Value ','Second Next Value'),
(4,CONVERT(date,'20210301'),'Second Next Value ','Final Value'))V(RecordId,DateStamp,OldValue,NewValue))
SELECT C.RecordId,
V.Value AS Value,
CASE V.ValueType WHEN 'New' THEN C.DateStamp END AS ValidFromDate,
CASE V.ValueType WHEN 'New' THEN LEAD(DATEADD(DAY, -1, C.DateStamp)) OVER (PARTITION BY C.RecordID ORDER BY C.DateStamp) ELSE DATEADD(DAY, -1, C.DateStamp) END AS ValidToDate
FROM CTE C
CROSS APPLY (VALUES(C.OldValue, 'Old'),(C.NewValue, 'New'))V(Value,ValueType)
WHERE RN = 1
OR ValueType = 'New';