I have the following example dataset (actual table is about 30000 rows).
I need to scan through the table and identify any non-contiguous dates for all references within the table that have non-contiguous data, and generate data to fill in the missing dates
So for the above reference I would need example data generated to be in bold below (showing existing rows for clarity):
ID | START_DATE | END_DATE |
---|---|---|
7172 | 2020-03-13 | 2020-10-22 |
7172 | 2020-10-23 | 2020-11-08 |
7172 | 2020-11-09 | 2020-11-09 |
7172 | 2020-11-10 | 2020-11-19 |
7172 | 2020-11-20 | 2020-11-20 |
7172 | 2020-11-21 | 2021-03-14 |
7172 | 2021-03-15 | 2021-03-17 |
7172 | 2021-03-18 | 2021-03-19 |
7172 | 2021-03-20 | 2021-03-28 |
7172 | 2021-03-29 | 2021-04-25 |
7172 | 2021-04-26 | 2021-30-04 |
7172 | 2021-05-01 | 2021-06-07 |
7172 | 2021-06-08 | 2021-06-08 |
7172 | 2021-06-09 | 2022-01-09 |
Can anyone help please?
CodePudding user response:
2021-30-04 is a bad date (hehe).
I took your table and used it to generate a table variable, and added a couple of rows:
DECLARE @TABLE TABLE (ID INT, START_DATE DATE, END_DATE DATE)
INSERT INTO @TABLE (ID, START_DATE, END_DATE) VALUES
(7172, '2020-03-13', '2020-10-22'), (7172, '2020-10-23', '2020-11-08'),
(7172, '2020-11-10', '2020-11-19'), (7172, '2020-11-20', '2020-11-20'),
(7172, '2020-11-21', '2021-03-14'), (7172, '2021-03-15', '2021-03-17'),
(7172, '2021-03-20', '2021-03-28'), (7172, '2021-03-29', '2021-04-25'),
(7172, '2021-05-01', '2021-06-07'), (7172, '2021-06-09', '2022-01-09'),
(1234, '2020-03-13', '2022-03-15'), (1234, '2022-03-20', '2022-03-25');
If I understand correctly, we'd be looking to generate rows:
--(7172, '2020-11-09', '2020-11-09'),
--(7172, '2021-03-18', '2021-03-19'),
--(7172, '2021-04-26', '2021-03-04'),
--(7172, '2021-06-08', '2021-06-08'),
--(1234, '2022-03-16', '2022-03-19'),
I poked at this some, and got the result you're looking for using some recursive CTE voodoo. I can't promise this is going to be performant, but it might be a good starting point:
DECLARE @MaxDate DATE = (SELECT MAX(END_DATE) FROM @TABLE)
;WITH DateRange AS (
SELECT ID, MIN(END_DATE) AS Date FROM @TABLE GROUP BY ID--@MinDate AS Date
UNION ALL
SELECT ID, DATEADD(DAY,1,Date)
FROM DateRange
WHERE Date < @MaxDate
), MissingDates AS (
SELECT dr.ID, dr.Date, t.START_DATE, t.END_DATE
FROM DateRange dr
LEFT OUTER JOIN @TABLE t
ON dr.Date BETWEEN t.START_DATE AND t.END_DATE
AND dr.ID = t.ID
WHERE t.ID IS NULL
), ranges AS (
SELECT s.ID, s.date AS START_DATE, s.date AS END_DATE, DATEADD(DAY,1,s.Date) AS NEXT_DATE
FROM MissingDates s
LEFT OUTER JOIN MissingDates n
ON s.Date = DATEADD(DAY,1,n.Date)
AND s.ID = n.ID
WHERE n.Date IS NULL
UNION ALL
SELECT a.ID, a.START_DATE, r.Date AS END_DATE, DATEADD(DAY,1,r.Date) AS NEXT_DATE
FROM ranges a
INNER JOIN MissingDates r
ON a.NEXT_DATE = r.Date
AND a.ID = r.ID
)
SELECT ID AS ID, START_DATE, MAX(END_DATE) AS END_DATE
FROM ranges
GROUP BY ID, START_DATE
OPTION (MAXRECURSION 0)
ID START_DATE END_DATE
----------------------------
7172 2020-11-09 2020-11-09
7172 2021-03-18 2021-03-19
7172 2021-04-26 2021-04-30
7172 2021-06-08 2021-06-08
7172 2022-01-10 2022-03-25
1234 2022-03-16 2022-03-19