I'm trying and googling since a few hours, but got no solution that works. Here is my issue:
I have a table with following structure: id SERIAL, datum DATE, otherinfo VARCHAR(50)
The data could have equal date entries:
id datum otherinfo
1 2019-12-28 testdata1
2 2019-12-28 testdata2
3 2019-12-29 testdata3
4 2019-12-29 testdata4
5 2019-12-31 testdata5-begin longest consecutive days
6 2019-12-31 testdata6
7 2020-01-01 testdata7
8 2020-01-01 testdata8
9 2020-01-02 testdata9
10 2020-01-03 testdata10
11 2020-01-04 testdata11
12 2020-01-04 testdata12
13 2020-01-05 testdata13-end longest consecutive days
14 2020-01-22 testdata14
15 2020-01-29 testdata15
16 2020-01-30 testdata16
I'm interested to get the consecutive days with the start and end date. An output like this:
count | date MIN | date MAX
6 2019-12-31 2020-01-05
2 2019-12-28 2019-12-29
2 2020-01-29 2020-01-30
I found a few ideas how to solve here on Stackoverflow, but it seems that it always conflicts with multiple equal date entries.
Most success I had with following SQL query:
SELECT COUNT(*) -1 "count", MAX(datum), MIN(datum) FROM (SELECT *, date(datum) - row_number() OVER (PARTITION BY datum ORDER BY date(datum)) * INTERVAL '1 day' "filter" FROM table ) t1 GROUP BY filter HAVING COUNT(*) -1 > 0 ORDER BY count DESC
Sadly it gives wrong counts of consecutive days and the counted days even don't match with the start / end dates.
Many thanks for ideas
Martin
CodePudding user response:
This is a Gaps & Islands problem. You can use the traditional solution:
select
(max(datum) - min(datum)) 1 as cnt,
min(datum) as date_min,
max(datum) as date_max
from (
select x.*, sum(i) over(order by datum) as g
from (
select t.*,
case when datum > lag(datum) over(order by datum) 1
then 1 else 0 end as i
from t
) x
) y
group by g
CodePudding user response:
This is called a gaps & islands problem. A typical way of solving this is numbering the rows (here with DENSE_RANK
as the dates are not yet unique) and comparing these numbers to a relative position. As you are dealing with dates, our position is the number of days from a fixed date.
select count(*), min(datum), max(datum)
from
(
select distinct
datum,
datum - date '1900-01-01' - dense_rank() over (order by datum) as grp
from mytable
) grouped
group by grp
order by grp;
Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f89e579db31ffd956fdea5d437625b68
If you find it more readable, you can add a step: first make the rows distinct, then run through the rows and get the diff of position and row number (which you can then get with ROW_NUMBER
instead of DENSE_RANK
), then aggregate.
CodePudding user response:
These are additional MySQL versions of the answers derived from @TheImpaler's and @Thorsten Kettner's answers. FIDDLE
@TheImpaler's Solution
select
datediff(max(datum), min(datum)) 1 as cnt,
min(datum) as date_min,
max(datum) as date_max
from (
select x.*, sum(i) over(order by datum) as g
from (
select t.*,
case when datediff(datum, lag(datum) over(order by datum)) > 1
then 1 else 0 end as i
from t
) x
) y
group by g;
@Thorsten Kettner's Solution
select
count(*) as cnt,
min(datum) as date_min,
max(datum) as date_max
from
(
select distinct
datum,
datediff(datum, date('1900-01-01')) - dense_rank() over (order by datum) as grp
from t order by datum
) grouped
group by grp
order by grp;