Dates across multiple rows-CodePudding

I have a table that has the following columns: Item, Date, Status Description, and Stock Status. It contains historical data for the stocking status of items.

Below is a brief sample of the input in table form.

Item	Date	Status Description	Stock Status
ABC123	2020-10-02	Listed Out of Stock (ABC123)	Out of Stock
ABC123	2020-10-15	In Stock (ABC123)	In Stock
ABC123	2021-05-04	Listed Out of Stock (ABC123)	Out of Stock
ABC123	2021-07-15	Listed Out of Stock (ABC123)	Out of Stock
ABC123	2021-07-27	Listed Out of Stock (ABC123)	Out of Stock
ABC123	2021-08-09	Listed Out of Stock (ABC123)	Out of Stock
ABC123	2021-10-19	In Stock (ABC123)	In Stock
...	...	...	...

And here is a script to create the test input data into a temp table.

DROP TABLE #OOS_History;

CREATE TABLE #OOS_History 
(
    [Item] NVARCHAR(100),
    [Date] DATETIME2,
    [Status Description] NVARCHAR(100),
    [Stock Status] NVARCHAR(25)
);

INSERT INTO #OOS_History ( [Item], [Date], [Status Description], [Stock Status] ) 
VALUES  
    ('ABC123',  '10/2/2020 13:53',  'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '10/15/2020 9:20',  'In Stock (ABC123)',    'In Stock'),
    ('ABC123',  '5/4/2021 8:22',    'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '7/15/2021 13:47',  'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '7/27/2021 8:04', 'Listed Out of Stock (ABC123)',   'Out of Stock'),
    ('ABC123',  '8/9/2021 13:12', 'Listed Out of Stock (ABC123)',   'Out of Stock'),
    ('ABC123',  '10/19/2021 8:04',  'In Stock (ABC123)',    'In Stock'),
    ('ABC123',  '10/28/2021 11:52', 'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '10/29/2021 9:24',  'In Stock (ABC123)',    'In Stock'),
    ('ABC123',  '12/6/2021 7:00', 'Listed Out of Stock (ABC123)',   'Out of Stock'),
    ('ABC123',  '12/6/2021 7:02', 'Listed Out of Stock (ABC123)',   'Out of Stock'),
    ('ABC123',  '12/15/2021 10:47', 'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '2/21/2022 14:25',  'In Stock (ABC123)',    'In Stock'),
    ('ABC123',  '4/7/2022 8:36',    'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '4/13/2022 7:39', 'In Stock (ABC123)',  'In Stock'),
    ('ABC123',  '4/19/2022 13:06',  'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '4/22/2022 14:07',  'In Stock (ABC123)',    'In Stock'),
    ('ABC123',  '4/28/2022 11:30',  'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '5/21/2022 6:25', 'In Stock (ABC123)',  'In Stock'),
    ('ABC123',  '7/12/2022 14:10',  'Listed Out of Stock (ABC123)', 'Out of Stock');

My goal is to create a new table that has the following 3 columns: Item, Start Date, and End Date. Each row will represent an interval that an item is out of stock.

Below is the desired output which should list the intervals (start and end date) that an item was out of stock.

Item	Start Date	End Date
ABC123	2020-10-02	2020-10-15
ABC123	2021-05-04	2021-10-19
ABC123	2021-10-28	2021-10-29
ABC123	2021-12-06	2021-12-15
ABC123	2022-04-07	2022-04-13
ABC123	2022-04-19	2022-04-22
ABC123	2022-04-28	2022-05-21
ABC123	2022-07-12	NULL

I believe I can do this by finding the groups that each row belongs in. In this case there should be 8 groups (rows 1-2), (rows 3-7), (rows 8-9), (rows 10-13), (rows 14-15), (rows 16-17), (rows 18-19), and (row 20). Then I can take the first row from each group as the starting date and the last row from each group as the ending date.

I've done some preliminary research and this appears to be an islands and gaps problem. I've found an answer on stackoverflow that looks like a good starting point but I'm stuck on how to apply it to my problem.

I'm copying the SQL query structure from the answer on this stackoverflow post (

CodePudding user response：

Here's a more intuitive approach, in my opinion:

WITH
    OOS_History_With_Status_Change_Flag AS 
        (
        SELECT
            *,
            CASE 
                WHEN LAG([Stock Status], 1, 1) OVER (ORDER BY [Date]) <> [Stock Status]
                    THEN 1 -- i.e. the status changed on this date
            END AS [Status Change]
        FROM
            #OOS_History
        ),
        
    OOS_History_Only_Status_Changes AS
        (
        SELECT
            *,
            CAST([Date] AS DATE) AS [Start Date],
            CAST(LEAD([Date], 1) OVER (ORDER BY [Date]) AS DATE) AS [End Date]
        FROM
            OOS_History_With_Status_Change_Flag
        WHERE
            [Status Change] = 1
        )

SELECT
    [Item],
    [Start Date],
    [End Date]
FROM
    OOS_History_Only_Status_Changes
WHERE
    [Stock Status] = 'Out of Stock'
ORDER BY
    [Start Date]
;

In essence, the only rows you care about from your initial dataset are ones that represent a status change (i.e. rows with a lagging row, when ordered by date, of differing status). The first CTE creates this flag column using the LAG() function; the second CTE uses this flag as a filter and defines the Start Date and End Date columns.

And just like that, you're done! Select rows where the status is out-of-stock, and you're golden.