The problem I am facing is how to find distinct time periods from multiple time periods with overlap.
For example, the attached tables contain multiple overlapping time periods, how can I combine those time periods into 3 unique time periods in SQL???
I think I can do it in python with the loop function, but not sure how to do it in SQL
ID | Start Date | End Date |
---|---|---|
001 | 2005-01-01 | 2006-01-01 |
001 | 2005-01-01 | 2007-01-01 |
001 | 2008-01-01 | 2008-06-01 |
001 | 2008-04-01 | 2008-12-01 |
001 | 2010-01-01 | 2010-05-01 |
001 | 2010-04-01 | 2010-12-01 |
001 | 2010-11-01 | 2012-01-01 |
My expected result is:
ID | start_Date | end_date |
---|---|---|
001 | 2005-01-01 | 2007-01-01 |
001 | 2008-01-01 | 2008-12-01 |
001 | 2010-01-01 | 2012-01-01 |
CodePudding user response:
This is a gaps and islands problem. Try this:
with u as
(select *,
case
when start_date <= lag(end_date) over(partition by ID order by start_date, end_date) then 0
else 1 end as grp
from table_name),
v as
(select ID, start_date, end_date,
sum(grp) over(partition by ID order by start_date, end_date) as island
from u)
select ID, min(start_date) as start_Date, max(end_date) as end_date
from v
group by ID, island;
Basically you can identify "islands" by comparing start_date
of current row to end_date
of previous row (ordered by start_date, end_date), if it precedes it then it's the same island. Then you can do a rolling sum()
to get the island numbers. Finally select min(start_date) and max(end_date) from each island to get the desired output.
CodePudding user response:
This may work ,with little bit of change in function , I tried it in Dbeaver :
select ID,Start_Date,End_Date
from
(
select t.*,
dense_rank () over(partition by extract (year from Start_Date) order BY End_Date desc) drnk
from testing_123 t
) temp
where temp.drnk = 1
ORDER BY Start_Date;
CodePudding user response:
Try this
WITH a as (
SELECT
ID,
LEFT(Start_Date, 4) as Year,
MIN(Start_Date) as New_Start_Date
FROM
TAB1
GROUP BY
ID,
LEFT(Start_Date, 4)
), b as (
SELECT
a.ID,
Year,
New_Start_Date,
End_Date
FROM
a
LEFT JOIN
TAB1
ON LEFT(a.New_Start_Date, 4) = LEFT(TAB1.Start_Date, 4)
)
select
ID,
New_Start_Date as Start_Date,
MAX(End_Date)
from
b
GROUP BY
ID,
New_Start_Date;
Example: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=97f91b68c635aebfb752538cdd752ace