I have the following data in a table:
ItemID | Date | Status |
---|---|---|
001 | 2021-01-12 | Active |
001 | 2021-01-16 | Discontinued |
001 | 2021-01-20 | Active |
I need to fill in the remaining dates like this:
ItemID | Date | Status |
---|---|---|
001 | 2021-01-12 | Active |
001 | 2021-01-13 | Active |
001 | 2021-01-14 | Active |
001 | 2021-01-15 | Active |
001 | 2021-01-16 | Discontinued |
001 | 2021-01-17 | Discontinued |
001 | 2021-01-18 | Discontinued |
001 | 2021-01-19 | Discontinued |
001 | 2021-01-20 | Active |
Also, I need suggestions on will it be efficient to fill data like this or create two different columns for Valid from and to dates in Data Warehouse?
CodePudding user response:
I have a working solution, but I am sure there are better ways to do this. I assume you would like a working solution, and then you can investigate the performance and optimize it if need be.
As pointed out in the comments, to solve this it is easiest if you have a calendar table. I assume you do not have anything, so I start from scratch. I generate the numbers 0 - 9 and then through successive CROSS JOINS
I use those numbers to generate the numbers 0 - 10,000. I did make the assumption that there are not more than 10,000 days between the minimum date and the maximum date, but if this is not correct you can change the code to generate more numbers.
My approach uses several common table expressions as this is how I work to incrementally solve a problem. So first generate the digits, then generate numbers, then determine the minimum and maximum dates for each ItemID
, then create a recordset that includes all the dates between the minimum and maximum dates for each ItemID
, then I LEFT JOIN
this to copy the Status
. Finally, you have the interesting problem of how to get the last non NULL
value for a column, and there are several approaches. Here is one article of many you can see different approaches: https://www.mssqltips.com/sqlservertip/7379/last-non-null-value-set-of-sql-server-records/ I used the approach that uses the MAX
function in a window.
So, putting this all together into a script and starting with your data in a table variable (as well as adding some records for another test), the whole things looks like this:
DECLARE @Data TABLE([ItemID] VARCHAR(3), [Date] DATE, [Status] VARCHAR(15));
INSERT INTO @Data ([ItemID],[Date],[Status])
VALUES ('001', '2021-01-12', 'Active'), ('001', '2021-01-16','Discontinued'),('001', '2021-01-20','Active'),
('002','2022-02-01','Active'), ('002','2022-03-01','Discontinued');
;WITH digits (I) AS
(
SELECT I
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I)
)
,integers (I) AS (
SELECT D1.I (10*D2.I) (100*D3.I) (1000*D4.I)
FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3 CROSS JOIN digits AS D4
), itemMinMaxDates AS (
SELECT [ItemID], MIN([Date]) AS [MinDate], MAX([Date]) AS [MaxDate]
FROM @Data GROUP BY [ItemID]
), itemsWithAllDates AS
(
SELECT [imm].[ItemID], DATEADD(DAY,i.I, imm.[MinDate]) AS [Date] FROM [itemMinMaxDates] AS imm CROSS JOIN [integers] AS i
WHERE DATEADD(DAY,i.I, imm.[MinDate]) BETWEEN imm.[MinDate] AND imm.[MaxDate]
), itemsWithAllDatesAndStatus AS
(
SELECT [allDates].[ItemID], [allDates].[Date], [d].[Status] FROM [itemsWithAllDates] AS allDates
LEFT OUTER JOIN @Data AS d ON [allDates].[ItemID] = [d].[ItemID] AND [allDates].[Date] = d.[Date]
), grp AS
(
SELECT [itemsWithAllDatesAndStatus].[ItemID],
[itemsWithAllDatesAndStatus].[Date],
[itemsWithAllDatesAndStatus].[Status],
MAX(IIF([itemsWithAllDatesAndStatus].[Status] IS NOT NULL, [itemsWithAllDatesAndStatus].[Date], NULL)) OVER (PARTITION BY [itemsWithAllDatesAndStatus].[ItemID] ORDER BY [itemsWithAllDatesAndStatus].[Date] ROWS UNBOUNDED PRECEDING) AS grp
FROM itemsWithAllDatesAndStatus
)
SELECT [grp].[ItemID], [grp].[Date],
MAX([grp].[Status]) OVER (PARTITION BY [grp].[ItemID], grp ORDER BY [grp].[Date] ROWS UNBOUNDED PRECEDING) AS [Status]
FROM [grp]
ORDER BY [grp].[ItemID], [grp].[Date];
The result is what you have shown (as well as the data I included for a test):
ItemID | Date | Status |
---|---|---|
001 | 2021-01-12 | Active |
001 | 2021-01-13 | Active |
001 | 2021-01-14 | Active |
001 | 2021-01-15 | Active |
001 | 2021-01-16 | Discontinued |
001 | 2021-01-17 | Discontinued |
001 | 2021-01-18 | Discontinued |
001 | 2021-01-19 | Discontinued |
001 | 2021-01-20 | Active |
002 | 2022-02-01 | Active |
002 | 2022-02-02 | Active |
002 | 2022-02-03 | Active |
002 | 2022-02-04 | Active |
002 | 2022-02-05 | Active |
002 | 2022-02-06 | Active |
002 | 2022-02-07 | Active |
002 | 2022-02-08 | Active |
002 | 2022-02-09 | Active |
002 | 2022-02-10 | Active |
002 | 2022-02-11 | Active |
002 | 2022-02-12 | Active |
002 | 2022-02-13 | Active |
002 | 2022-02-14 | Active |
002 | 2022-02-15 | Active |
002 | 2022-02-16 | Active |
002 | 2022-02-17 | Active |
002 | 2022-02-18 | Active |
002 | 2022-02-19 | Active |
002 | 2022-02-20 | Active |
002 | 2022-02-21 | Active |
002 | 2022-02-22 | Active |
002 | 2022-02-23 | Active |
002 | 2022-02-24 | Active |
002 | 2022-02-25 | Active |
002 | 2022-02-26 | Active |
002 | 2022-02-27 | Active |
002 | 2022-02-28 | Active |
002 | 2022-03-01 | Discontinued |
Like I said, this is a working solution, but it is likely not the best or most efficient solution - but it gets you up and running.