Home > Software engineering >  Generate missing non-contiguous date ranges
Generate missing non-contiguous date ranges

Time:12-08

I have the following example dataset (actual table is about 30000 rows).

Data

I need to scan through the table and identify any non-contiguous dates for all references within the table that have non-contiguous data, and generate data to fill in the missing dates

So for the above reference I would need example data generated to be in bold below (showing existing rows for clarity):

ID START_DATE END_DATE
7172 2020-03-13 2020-10-22
7172 2020-10-23 2020-11-08
7172 2020-11-09 2020-11-09
7172 2020-11-10 2020-11-19
7172 2020-11-20 2020-11-20
7172 2020-11-21 2021-03-14
7172 2021-03-15 2021-03-17
7172 2021-03-18 2021-03-19
7172 2021-03-20 2021-03-28
7172 2021-03-29 2021-04-25
7172 2021-04-26 2021-30-04
7172 2021-05-01 2021-06-07
7172 2021-06-08 2021-06-08
7172 2021-06-09 2022-01-09

Can anyone help please?

CodePudding user response:

2021-30-04 is a bad date (hehe).

I took your table and used it to generate a table variable, and added a couple of rows:

DECLARE @TABLE TABLE (ID INT, START_DATE DATE, END_DATE DATE)
INSERT INTO @TABLE (ID, START_DATE, END_DATE) VALUES
(7172, '2020-03-13', '2020-10-22'), (7172, '2020-10-23', '2020-11-08'),
(7172, '2020-11-10', '2020-11-19'), (7172, '2020-11-20', '2020-11-20'),
(7172, '2020-11-21', '2021-03-14'), (7172, '2021-03-15', '2021-03-17'),
(7172, '2021-03-20', '2021-03-28'), (7172, '2021-03-29', '2021-04-25'),
(7172, '2021-05-01', '2021-06-07'), (7172, '2021-06-09', '2022-01-09'),
(1234, '2020-03-13', '2022-03-15'), (1234, '2022-03-20', '2022-03-25');

If I understand correctly, we'd be looking to generate rows:

--(7172, '2020-11-09', '2020-11-09'),
--(7172, '2021-03-18', '2021-03-19'),
--(7172, '2021-04-26', '2021-03-04'),
--(7172, '2021-06-08', '2021-06-08'),
--(1234, '2022-03-16', '2022-03-19'),

I poked at this some, and got the result you're looking for using some recursive CTE voodoo. I can't promise this is going to be performant, but it might be a good starting point:

DECLARE @MaxDate DATE = (SELECT MAX(END_DATE) FROM @TABLE)

;WITH DateRange AS (
SELECT ID, MIN(END_DATE) AS Date FROM @TABLE GROUP BY ID--@MinDate AS Date
UNION ALL
SELECT ID, DATEADD(DAY,1,Date)
  FROM DateRange
 WHERE Date < @MaxDate
), MissingDates AS (
SELECT dr.ID, dr.Date, t.START_DATE, t.END_DATE
  FROM DateRange dr
    LEFT OUTER JOIN @TABLE t
      ON dr.Date BETWEEN t.START_DATE AND t.END_DATE
      AND dr.ID = t.ID
 WHERE t.ID IS NULL  
), ranges AS (
SELECT s.ID, s.date AS START_DATE, s.date AS END_DATE, DATEADD(DAY,1,s.Date) AS NEXT_DATE
  FROM MissingDates s
    LEFT OUTER JOIN MissingDates n
      ON s.Date = DATEADD(DAY,1,n.Date)
      AND s.ID = n.ID
 WHERE n.Date IS NULL   
UNION ALL
SELECT a.ID, a.START_DATE, r.Date AS END_DATE, DATEADD(DAY,1,r.Date) AS NEXT_DATE
  FROM ranges a
    INNER JOIN MissingDates r
      ON a.NEXT_DATE = r.Date
      AND a.ID = r.ID
)
SELECT ID AS ID, START_DATE, MAX(END_DATE) AS END_DATE
  FROM ranges
 GROUP BY ID, START_DATE
OPTION (MAXRECURSION 0)
ID      START_DATE  END_DATE
----------------------------
7172    2020-11-09  2020-11-09
7172    2021-03-18  2021-03-19
7172    2021-04-26  2021-04-30
7172    2021-06-08  2021-06-08
7172    2022-01-10  2022-03-25
1234    2022-03-16  2022-03-19
  •  Tags:  
  • tsql
  • Related