Home > Net >  Combine periods by user
Combine periods by user

Time:04-25

I want to combine periods(start -> end dates) by the intervals they overlap into a single interval by user.

 --------- ------------- ------------ ------------ 
| USER_ID | CONTRACT_ID | START_DATE |  END_DATE  |
 --------- ------------- ------------ ------------ 
|       1 |          14 | 18.02.2021 | 18.04.2022 |
|       1 |          13 | 01.01.2019 | 01.01.2020 |
|       1 |          12 | 01.01.2018 | 01.01.2019 |
|       1 |          11 | 13.02.2017 | 13.02.2019 |
|       2 |          23 | 19.06.2021 | 18.04.2022 |
|       2 |          22 | 01.07.2019 | 01.07.2020 |
|       2 |          21 | 19.01.2019 | 19.01.2020 |
 --------- ------------- ------------ ------------ 

And as a result I want table like this:

 --------- ------------ ------------ 
| USER_ID | START_DATE |  END_DATE  |
 --------- ------------ ------------ 
|       1 | 18.02.2021 | 18.04.2022 |
|       1 | 13.02.2017 | 01.01.2020 |
|       2 | 19.06.2021 | 18.04.2022 |
|       2 | 19.01.2019 | 01.07.2020 |
 --------- ------------ ------------ 

I tried different options but nothing seems to work.

CodePudding user response:

WITH DATAD (USER_ID,CONTRACT_ID,START_DATE,END_DATE)AS
(
  SELECT 1,14,'2021-02-18','2022-04-18' UNION ALL
  SELECT 1,13,'2019-01-01','2020-01-01' UNION ALL
  SELECT 1,12,'2018-01-01','2019-01-01' UNION ALL
  SELECT 1,11,'2017-02-13','2019-02-13' UNION ALL
  SELECT 2,23,'2021-06-19','2022-04-18' UNION ALL
  SELECT 2,22,'2019-07-01','2020-07-01' UNION ALL
  SELECT 2,21,'2019-01-19','2020-01-19' 
),
CTE AS
(
  SELECT USER_ID,START_DATE AS TS ,1 AS TYPE,
  ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY START_DATE) AS S,NULL AS E 
  FROM DATAD

   UNION ALL

 SELECT USER_ID,END_DATE AS TS,-1 AS TYPE,
 NULL,ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY END_DATE)AS E
 FROM DATAD
),
C2 AS
(
  SELECT *,ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY TS,TYPE DESC)AS SE 
  FROM CTE 
),
C3 AS 
(
 SELECT *,
  FLOOR((ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY TS) 1)/2)AS P
 FROM C2
 CROSS APPLY (VALUES(S-(SE-S)-1,(SE-E)-E)) AS A(CS,CE)
 WHERE CS=0 OR CE=0
)
SELECT USER_ID,MIN(TS)AS START_TIME,MAX(TS)AS END_TIME
FROM C3
GROUP BY USER_ID,P
ORDER BY USER_ID;

This is a "copy-paste" from the book "T-SQL Querying" by Itzik Ben-Gan. Paragraph about "packing intervals"

CodePudding user response:

Ok, so I used the following logic to get answer-

  1. Augment end date on ranges to larger of the ranges which overlap
  2. Eliminate all ranges which lie insider another range totally.

So I used a cte to map out first step and a WHERE NOT EXISTS for second step like this

; with extendedset as 
(
select 
  A.user_id,
  A.contract_id, 
  A.start_date, 
  max(case when B.user_id is null then A.end_date else B.end_date end) end_date
from t A left join t B 
 on 
  A.user_id=B.user_id 
  and A.contract_id<>B.contract_id
  and A.end_date>=B.start_date
  and A.end_date <=B.end_date
group by A.user_id,
         A.contract_id, 
         A.start_date
)
select 
   user_id,start_date, end_date 
from extendedset t1
where not exists 
( select 1 from extendedset t2
where t1.user_id=t2.user_id and t1.contract_id<>t2.contract_id
and t1.start_date >= t2.start_date and t1.end_date <=t2.end_date)

see working dbfiddle

  • Related