Can someone help me combining consecutive dates in different rows as shown above.Rows in first image are input and records in 2nd image are output.
CodePudding user response:
In case it helps to find information about this in the future, this kind of algorithm is called "islands and gaps". Below is an example of code that will create what you are looking for. This code has a few assumptions:
- The dates have no time portion. This may not be an issue but I did not test it.
- Dates are not consolidated across different WID/LIDs.
- There are no null dates in
EndDate
. If nulls are present, the data needs to be brought in like this:ISNULL(EndDate, '9999-12-31')
Here is the code:
IF OBJECT_ID('tempdb..SourceData') IS NOT NULL
DROP TABLE #SourceData;
CREATE TABLE #SourceData
(
WID VARCHAR(2) NOT NULL
, LID VARCHAR(2) NOT NULL
, StartDate DATE NOT NULL
, EndDate DATE NOT NULL
)
INSERT INTO #SourceData (WID, LID, StartDate, EndDate) VALUES
('W1','L1','1960-02-10','1988-03-22'),
('W1','L1','1988-03-23','1988-03-28'),
('W1','L1','1991-03-14','2010-10-20'),
('W2','L2','1964-10-29','1991-07-04'),
('W2','L2','1991-07-05','1992-01-28'),
('W2','L2','1992-01-29','1992-01-30');
IF OBJECT_ID('tempdb..ConsolidatedData') IS NOT NULL
DROP TABLE #ConsolidatedData;
CREATE TABLE #ConsolidatedData
(
WID VARCHAR(2) NOT NULL
, LID VARCHAR(2) NOT NULL
, StartDate DATE NOT NULL
, EndDate DATE NULL
);
WITH base_data AS
(
SELECT
WID
, LID
, StartDate
, EndDate
-- For each record, get the closest previous thru date, for consolidating records below
, MAX(EndDate) OVER (
PARTITION BY WID, LID
ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS RunningMaxEndDate
FROM #SourceData
)
, islands AS
(
SELECT
WID
, LID
, StartDate
, EndDate
-- Create a running count of each gap (island), which prevents them from being consolidated
, SUM(IIF(RunningMaxEndDate >= DATEADD(DAY, -1, StartDate), 0, 1)) OVER (
PARTITION BY WID, LID
ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS IslandNumber
FROM base_data
)
INSERT INTO #ConsolidatedData
(
WID
, LID
, StartDate
, EndDate
)
SELECT
WID
, LID
, MIN(StartDate) AS StartDate
, MAX(EndDate) AS EndDate
FROM islands
GROUP BY
WID
, LID
, IslandNumber;
SELECT * FROM #SourceData;
SELECT * FROM #ConsolidatedData;
DROP TABLE #SourceData;
DROP TABLE #ConsolidatedData;
CodePudding user response:
No idea how performant this will be on large datasets or if it will cover all scenarios, but it does return the desired output from the sample data provided.
SELECT DISTINCT
WID
,LID
,ISNULL(StartDate, LAG(StartDate) OVER (PARTITION BY WID, LID ORDER BY RowNum)) as StartDate
,ISNULL(Enddate, LEAD(endDate) OVER (PARTITION BY WID, LID ORDER BY RowNum)) as Enddate
FROM (
SELECT
WID
,LID
,IIF(DATEADD(dd, -1, startdate) = ISNULL(LAG(endDate) OVER (PARTITION BY WID, LID ORDER BY enddate), enddate), null, startdate) AS StartDate
,IIF(DATEADD(dd, 1, enddate) = LEAD(startDate) OVER (PARTITION BY WID, LID ORDER BY Startdate), null, enddate) AS EndDate
,ROW_NUMBER() OVER (PARTITION BY WID, LID ORDER BY startDate, Enddate) AS rownum
FROM (VALUES
('W1','L1','1960-02-10','1988-03-22'),
('W1','L1','1988-03-23','1988-03-28'),
('W1','L1','1991-03-14','2010-10-20'),
('W2','L2','1964-10-29','1991-07-04'),
('W2','L2','1991-07-05','1992-01-28'),
('W2','L2','1992-01-29','1992-01-30')
) Sub(WID, LID, StartDate, EndDate)
) sub
WHERE
ISNULL(StartDate, EndDate) IS NOT NULL
ORDER BY
wid
,lid
,StartDate
,Enddate
Returns:
WID | LID | StartDate | Enddate
--------------------------------------
W1 | L1 | 1960-02-10 | 1988-03-28
W1 | L1 | 1991-03-14 | 2010-10-20
W2 | L2 | 1964-10-29 | 1992-01-30