Home > OS >  Combine two consecutive dates in different rows
Combine two consecutive dates in different rows

Time:11-09

Can someone help me combining consecutive dates in different rows as shown below

Can someone help me combining consecutive dates in different rows as shown above.Rows in first image are input and records in 2nd image are output.

CodePudding user response:

In case it helps to find information about this in the future, this kind of algorithm is called "islands and gaps". Below is an example of code that will create what you are looking for. This code has a few assumptions:

  • The dates have no time portion. This may not be an issue but I did not test it.
  • Dates are not consolidated across different WID/LIDs.
  • There are no null dates in EndDate. If nulls are present, the data needs to be brought in like this: ISNULL(EndDate, '9999-12-31')

Here is the code:

IF OBJECT_ID('tempdb..SourceData') IS NOT NULL
    DROP TABLE #SourceData;

CREATE TABLE #SourceData
(
    WID VARCHAR(2) NOT NULL
    , LID VARCHAR(2) NOT NULL
    , StartDate DATE NOT NULL
    , EndDate DATE NOT NULL
)

INSERT INTO #SourceData (WID, LID, StartDate, EndDate) VALUES
    ('W1','L1','1960-02-10','1988-03-22'),
    ('W1','L1','1988-03-23','1988-03-28'),
    ('W1','L1','1991-03-14','2010-10-20'),
    ('W2','L2','1964-10-29','1991-07-04'),
    ('W2','L2','1991-07-05','1992-01-28'),
    ('W2','L2','1992-01-29','1992-01-30');

IF OBJECT_ID('tempdb..ConsolidatedData') IS NOT NULL
    DROP TABLE #ConsolidatedData;

CREATE TABLE #ConsolidatedData
(
    WID VARCHAR(2) NOT NULL
    , LID VARCHAR(2) NOT NULL
    , StartDate DATE NOT NULL
    , EndDate DATE NULL
);

WITH base_data AS
(
    SELECT
        WID
        , LID
        , StartDate
        , EndDate
        -- For each record, get the closest previous thru date, for consolidating records below
        , MAX(EndDate) OVER (
            PARTITION BY WID, LID
            ORDER BY StartDate, EndDate
            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS RunningMaxEndDate
    FROM #SourceData
)
, islands AS
(
    SELECT
        WID
        , LID
        , StartDate
        , EndDate
        -- Create a running count of each gap (island), which prevents them from being consolidated
        , SUM(IIF(RunningMaxEndDate >= DATEADD(DAY, -1, StartDate), 0, 1)) OVER (
            PARTITION BY WID, LID
            ORDER BY StartDate, EndDate
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS IslandNumber
    FROM base_data
)

INSERT INTO #ConsolidatedData
    (
        WID
        , LID
        , StartDate
        , EndDate
    )
    SELECT
        WID
        , LID
        , MIN(StartDate) AS StartDate
        , MAX(EndDate) AS EndDate
    FROM islands
    GROUP BY
        WID
        , LID
        , IslandNumber;

SELECT * FROM #SourceData;
SELECT * FROM #ConsolidatedData;

DROP TABLE #SourceData;
DROP TABLE #ConsolidatedData;

CodePudding user response:

No idea how performant this will be on large datasets or if it will cover all scenarios, but it does return the desired output from the sample data provided.

SELECT DISTINCT
    WID
    ,LID
    ,ISNULL(StartDate, LAG(StartDate) OVER (PARTITION BY WID, LID ORDER BY RowNum)) as StartDate
    ,ISNULL(Enddate, LEAD(endDate) OVER (PARTITION BY WID, LID ORDER BY RowNum)) as Enddate
FROM (
    SELECT
        WID
        ,LID
        ,IIF(DATEADD(dd, -1, startdate) = ISNULL(LAG(endDate) OVER (PARTITION BY WID, LID ORDER BY enddate), enddate), null, startdate) AS StartDate
        ,IIF(DATEADD(dd, 1, enddate) = LEAD(startDate) OVER (PARTITION BY WID, LID ORDER BY Startdate), null, enddate) AS EndDate
        ,ROW_NUMBER() OVER (PARTITION BY WID, LID ORDER BY startDate, Enddate) AS  rownum
    FROM (VALUES 
            ('W1','L1','1960-02-10','1988-03-22'),
            ('W1','L1','1988-03-23','1988-03-28'),
            ('W1','L1','1991-03-14','2010-10-20'),
            ('W2','L2','1964-10-29','1991-07-04'),
            ('W2','L2','1991-07-05','1992-01-28'),
            ('W2','L2','1992-01-29','1992-01-30')
        ) Sub(WID, LID, StartDate, EndDate)
) sub
WHERE
    ISNULL(StartDate, EndDate) IS NOT NULL
ORDER BY 
    wid
    ,lid 
    ,StartDate
    ,Enddate

Returns:

WID | LID  | StartDate  | Enddate
--------------------------------------
W1  | L1   | 1960-02-10 | 1988-03-28
W1  | L1   | 1991-03-14 | 2010-10-20
W2  | L2   | 1964-10-29 | 1992-01-30
  • Related