Home > OS >  PostgreSQL - consecutive day with min/max - tables contains multiple equal days
PostgreSQL - consecutive day with min/max - tables contains multiple equal days

Time:05-30

I'm trying and googling since a few hours, but got no solution that works. Here is my issue:

I have a table with following structure: id SERIAL, datum DATE, otherinfo VARCHAR(50)

The data could have equal date entries:

id  datum      otherinfo
1   2019-12-28 testdata1
2   2019-12-28 testdata2
3   2019-12-29 testdata3
4   2019-12-29 testdata4
5   2019-12-31 testdata5-begin longest consecutive days
6   2019-12-31 testdata6
7   2020-01-01 testdata7
8   2020-01-01 testdata8
9   2020-01-02 testdata9
10  2020-01-03 testdata10
11  2020-01-04 testdata11
12  2020-01-04 testdata12
13  2020-01-05 testdata13-end longest consecutive days
14  2020-01-22 testdata14
15  2020-01-29 testdata15
16  2020-01-30 testdata16

I'm interested to get the consecutive days with the start and end date. An output like this:

count | date MIN  | date MAX
6       2019-12-31  2020-01-05
2       2019-12-28  2019-12-29
2       2020-01-29  2020-01-30

I found a few ideas how to solve here on Stackoverflow, but it seems that it always conflicts with multiple equal date entries.

Most success I had with following SQL query:

SELECT COUNT(*) -1 "count", MAX(datum), MIN(datum) FROM (SELECT *, date(datum) - row_number() OVER (PARTITION BY datum ORDER BY date(datum)) * INTERVAL '1 day' "filter" FROM table ) t1 GROUP BY filter HAVING COUNT(*) -1 > 0 ORDER BY count DESC

Sadly it gives wrong counts of consecutive days and the counted days even don't match with the start / end dates.

Many thanks for ideas

Martin

CodePudding user response:

This is a Gaps & Islands problem. You can use the traditional solution:

select
  (max(datum) - min(datum))   1 as cnt,
  min(datum) as date_min,
  max(datum) as date_max
from (
  select x.*, sum(i) over(order by datum) as g
  from (
    select t.*,
      case when datum > lag(datum) over(order by datum)   1 
           then 1 else 0 end as i
    from t
  ) x
) y
group by g

CodePudding user response:

This is called a gaps & islands problem. A typical way of solving this is numbering the rows (here with DENSE_RANK as the dates are not yet unique) and comparing these numbers to a relative position. As you are dealing with dates, our position is the number of days from a fixed date.

select count(*), min(datum), max(datum)
from
(
  select distinct 
    datum,
    datum - date '1900-01-01' - dense_rank() over (order by datum) as grp
  from mytable
) grouped
group by grp
order by grp;

Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f89e579db31ffd956fdea5d437625b68

If you find it more readable, you can add a step: first make the rows distinct, then run through the rows and get the diff of position and row number (which you can then get with ROW_NUMBER instead of DENSE_RANK), then aggregate.

CodePudding user response:

These are additional MySQL versions of the answers derived from @TheImpaler's and @Thorsten Kettner's answers. FIDDLE

@TheImpaler's Solution

select
  datediff(max(datum), min(datum))   1 as cnt,
  min(datum) as date_min,
  max(datum) as date_max
from (
  select x.*, sum(i) over(order by datum) as g
  from (
    select t.*,
      case when datediff(datum, lag(datum) over(order by datum)) > 1 
           then 1 else 0 end as i
    from t
  ) x
) y
group by g;

@Thorsten Kettner's Solution

select 
    count(*) as cnt,
    min(datum) as date_min,
    max(datum) as date_max
from
(
  select distinct 
    datum,
    datediff(datum, date('1900-01-01')) - dense_rank() over (order by datum) as grp
  from t order by datum
) grouped
group by grp
order by grp;
  • Related