I have a table :
ID | Startdate | Enddate | TEXT |
---|---|---|---|
0011 | 2022-02-07 | 2022-02-07 | TEXT1 |
0011 | 2022-02-04 | 2022-02-05 | TEXT2 |
0011 | 2022-02-06 | 2022-02-06 | TEXT3 |
0011 | 2022-02-03 | 2022-02-03 | TEXT4 |
0011 | 2022-02-03 | 2022-02-04 | TEXT5 |
0011 | 2022-02-02 | 2022-02-07 | TEXT6 |
0011 | 2022-02-02 | 2022-02-02 | TEXT7 |
0011 | 2021-12-01 | 2021-12-03 | TEXT8 |
Expected output:
ID | Startdate | Enddate | TEXT |
---|---|---|---|
0011 | 2022-02-02 | 2022-02-07 | TEXT1,TEXT2,TEXT3,TEXT4,TEXT5,TEXT6,TEXT7 |
0011 | 2021-12-01 | 2021-12-03 | TEXT8 |
I tried with :
WITH _DAYS AS (
SELECT DATEADD(DAY, SEQ4(), '2021-12-01') AS DAY
FROM TABLE(GENERATOR(ROWCOUNT => 68))
), _GRPS AS (
SELECT *
, DATEDIFF(DAY, '2021-12-01', D.DAY) - DENSE_RANK() OVER(PARTITION BY PASS1.MEMBER_ID ORDER BY D.DAY) AS GRP
FROM _DAYS AS D
JOIN table PASS1
ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
)
SELECT ID
, TEXT
, MIN(DAY) AS START_DATE
, MAX(DAY) AS END_DATE
FROM _GRPS
GROUP BY ID,TEXT, GRP
I was able to achieve the desired output with only start-date and end-date in the table but the inclusion of TEXT column did not give me the desired output.
Please suggest !
CodePudding user response:
Mostly a fix-up of Greg's Answer..
With a CTE for the pass1
data:
select * from values
('0011', '2022-02-07', '2022-02-07', 'TEXT1'),
('0011', '2022-02-04', '2022-02-05', 'TEXT2'),
('0011', '2022-02-06', '2022-02-06', 'TEXT3'),
('0011', '2022-02-03', '2022-02-03', 'TEXT4'),
('0011', '2022-02-03', '2022-02-04', 'TEXT5'),
('0011', '2022-02-02', '2022-02-07', 'TEXT6'),
('0011', '2022-02-02', '2022-02-02', 'TEXT7'),
('0011', '2021-12-01', '2021-12-03', 'TEXT8')
)
WITH _DAYS AS (
SELECT
DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY NULL)-1, '2021-12-01')::date AS DAY
FROM TABLE(GENERATOR(ROWCOUNT => 68))
), _GRPS AS (
SELECT *
,DATEDIFF(DAY, '2021-12-01', D.DAY) - DENSE_RANK() OVER(PARTITION BY PASS1.ID ORDER BY D.DAY) AS GRP
FROM _DAYS AS D
JOIN PASS1
ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
)
SELECT
ID
,MIN(DAY) AS START_DATE
,MAX(DAY) AS END_DATE
,listagg(distinct TEXT, ',') WITHIN GROUP (ORDER BY TEXT) as TEXT
FROM _GRPS
GROUP BY ID,GRP
gives:
ID | START_DATE | END_DATE | TEXT |
---|---|---|---|
0011 | 2022-02-02 | 2022-02-07 | TEXT1,TEXT2,TEXT3,TEXT4,TEXT5,TEXT6,TEXT7 |
0011 | 2021-12-01 | 2021-12-03 | TEXT8 |
you should use ROW_NUMBER
to get continuous values, as SEQx()
can and do have gaps. In Greg's answer he used ORDER BY on the sub-select/CTE, when you should use the WINTHIN GROUP of the LISTAGG to ORDER BY, as it's more targeted sort.
Also given you have a generator, in _DAYS
you can use that as the first half of the gaps-and-islands and thus skip the math..
WITH _DAYS AS (
SELECT
ROW_NUMBER() OVER(ORDER BY NULL)-1 as rn,
DATEADD(DAY, rn, '2021-12-01')::date AS DAY
FROM TABLE(GENERATOR(ROWCOUNT => 68))
), _GRPS AS (
SELECT *
,d.rn - DENSE_RANK() OVER(PARTITION BY PASS1.ID ORDER BY D.DAY) as grp
FROM _DAYS AS D
JOIN PASS1
ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
)
...
CodePudding user response:
LISTAGG will do what you need, but there are duplicate TEXT values and they're out of order. You can order them in the table expression above the final query in the CTE and use distinct
to deduplicate the values:
create table pass1(id int, startdate date, enddate date, text string);
insert into pass1(id, startdate, enddate, text) values
(0011, '2022-02-07', '2022-02-07', 'TEXT1'),
(0011, '2022-02-04', '2022-02-05', 'TEXT2'),
(0011, '2022-02-06', '2022-02-06', 'TEXT3'),
(0011, '2022-02-03', '2022-02-03', 'TEXT4'),
(0011, '2022-02-03', '2022-02-04', 'TEXT5'),
(0011, '2022-02-02', '2022-02-07', 'TEXT6'),
(0011, '2022-02-02', '2022-02-02', 'TEXT7'),
(0011, '2021-12-01', '2021-12-03', 'TEXT8');
WITH _DAYS AS (
SELECT DATEADD(DAY, SEQ4(), '2021-12-01') AS DAY
FROM TABLE(GENERATOR(ROWCOUNT => 68))
), _GRPS AS (
SELECT *
, DATEDIFF(DAY, '2021-12-01', D.DAY) - DENSE_RANK() OVER(PARTITION BY PASS1.ID ORDER BY D.DAY) AS GRP
FROM _DAYS AS D
JOIN PASS1
ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
order by TEXT
)
SELECT ID
, listagg(distinct TEXT, ',') as TEXT
, MIN(DAY) AS START_DATE
, MAX(DAY) AS END_DATE
FROM _GRPS