Home > Enterprise >  Generate Start Date and End Date based on the contiguous rows
Generate Start Date and End Date based on the contiguous rows

Time:05-26

I have a table :

ID Startdate Enddate TEXT
0011 2022-02-07 2022-02-07 TEXT1
0011 2022-02-04 2022-02-05 TEXT2
0011 2022-02-06 2022-02-06 TEXT3
0011 2022-02-03 2022-02-03 TEXT4
0011 2022-02-03 2022-02-04 TEXT5
0011 2022-02-02 2022-02-07 TEXT6
0011 2022-02-02 2022-02-02 TEXT7
0011 2021-12-01 2021-12-03 TEXT8

Expected output:

ID Startdate Enddate TEXT
0011 2022-02-02 2022-02-07 TEXT1,TEXT2,TEXT3,TEXT4,TEXT5,TEXT6,TEXT7
0011 2021-12-01 2021-12-03 TEXT8

I tried with :

WITH _DAYS AS (
                                SELECT DATEADD(DAY, SEQ4(), '2021-12-01') AS DAY
                                  FROM TABLE(GENERATOR(ROWCOUNT => 68))
                            ), _GRPS AS (
                              SELECT *
                                   , DATEDIFF(DAY, '2021-12-01', D.DAY) - DENSE_RANK() OVER(PARTITION BY PASS1.MEMBER_ID ORDER BY D.DAY) AS GRP
                                FROM _DAYS AS D
                                JOIN table PASS1
                                ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
                            )
                            SELECT ID
                                 , TEXT
                                 , MIN(DAY) AS START_DATE
                                 , MAX(DAY) AS END_DATE    
                              FROM _GRPS
                             GROUP BY ID,TEXT, GRP

I was able to achieve the desired output with only start-date and end-date in the table but the inclusion of TEXT column did not give me the desired output.

Please suggest !

CodePudding user response:

Mostly a fix-up of Greg's Answer..

With a CTE for the pass1 data:

select * from values
    ('0011',  '2022-02-07',   '2022-02-07',   'TEXT1'),
    ('0011',  '2022-02-04',   '2022-02-05',   'TEXT2'),
    ('0011',  '2022-02-06',   '2022-02-06',   'TEXT3'),
    ('0011',  '2022-02-03',   '2022-02-03',   'TEXT4'),
    ('0011',  '2022-02-03',   '2022-02-04',   'TEXT5'),
    ('0011',  '2022-02-02',   '2022-02-07',   'TEXT6'),
    ('0011',  '2022-02-02',   '2022-02-02',   'TEXT7'),
    ('0011',  '2021-12-01',   '2021-12-03',   'TEXT8')
)
WITH _DAYS AS (
    SELECT 
        DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY NULL)-1, '2021-12-01')::date AS DAY
    FROM TABLE(GENERATOR(ROWCOUNT => 68))
), _GRPS AS (
    SELECT *
       ,DATEDIFF(DAY, '2021-12-01', D.DAY) - DENSE_RANK() OVER(PARTITION BY PASS1.ID ORDER BY D.DAY) AS GRP
    FROM _DAYS AS D
    JOIN PASS1
    ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
)
SELECT 
    ID
     ,MIN(DAY) AS START_DATE
     ,MAX(DAY) AS END_DATE    
     ,listagg(distinct TEXT, ',') WITHIN GROUP (ORDER BY TEXT) as TEXT
FROM _GRPS
GROUP BY ID,GRP

gives:

ID START_DATE END_DATE TEXT
0011 2022-02-02 2022-02-07 TEXT1,TEXT2,TEXT3,TEXT4,TEXT5,TEXT6,TEXT7
0011 2021-12-01 2021-12-03 TEXT8

you should use ROW_NUMBER to get continuous values, as SEQx() can and do have gaps. In Greg's answer he used ORDER BY on the sub-select/CTE, when you should use the WINTHIN GROUP of the LISTAGG to ORDER BY, as it's more targeted sort.

Also given you have a generator, in _DAYS you can use that as the first half of the gaps-and-islands and thus skip the math..

WITH _DAYS AS (
    SELECT 
        ROW_NUMBER() OVER(ORDER BY NULL)-1 as rn,
        DATEADD(DAY, rn, '2021-12-01')::date AS DAY
    FROM TABLE(GENERATOR(ROWCOUNT => 68))
), _GRPS AS (
    SELECT *
       ,d.rn - DENSE_RANK() OVER(PARTITION BY PASS1.ID ORDER BY D.DAY) as grp
    FROM _DAYS AS D
    JOIN PASS1
    ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
)
...

CodePudding user response:

LISTAGG will do what you need, but there are duplicate TEXT values and they're out of order. You can order them in the table expression above the final query in the CTE and use distinct to deduplicate the values:

create table pass1(id int, startdate date, enddate date, text string);

insert into pass1(id, startdate, enddate, text) values

(0011,  '2022-02-07',   '2022-02-07',   'TEXT1'),
(0011,  '2022-02-04',   '2022-02-05',   'TEXT2'),
(0011,  '2022-02-06',   '2022-02-06',   'TEXT3'),
(0011,  '2022-02-03',   '2022-02-03',   'TEXT4'),
(0011,  '2022-02-03',   '2022-02-04',   'TEXT5'),
(0011,  '2022-02-02',   '2022-02-07',   'TEXT6'),
(0011,  '2022-02-02',   '2022-02-02',   'TEXT7'),
(0011,  '2021-12-01',   '2021-12-03',   'TEXT8');


WITH _DAYS AS (
                                SELECT DATEADD(DAY, SEQ4(), '2021-12-01') AS DAY
                                  FROM TABLE(GENERATOR(ROWCOUNT => 68))
                            ), _GRPS AS (
                              SELECT *
                                   , DATEDIFF(DAY, '2021-12-01', D.DAY) - DENSE_RANK() OVER(PARTITION BY PASS1.ID ORDER BY D.DAY) AS GRP
                                FROM _DAYS AS D
                                JOIN PASS1
                                ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
                                order by TEXT
                            )
                            SELECT ID
                                 , listagg(distinct TEXT, ',') as TEXT
                                 , MIN(DAY) AS START_DATE
                                 , MAX(DAY) AS END_DATE    
                              FROM _GRPS
  • Related