Home > Software engineering >  Postgres group data by non gaps
Postgres group data by non gaps

Time:11-12

I am trying to group some data by dates that do not have gaps

I am also actually trying to build this as a view.

identifier start_date     end_date
4EF        2021-07-29     2021-08-05
4EF        2021-08-06     2021-08-09    
4EF        2021-10-07     2021-10-12    
4EF        2021-10-13     2021-11-12    
2AB        2021-01-01     2021-06-20    
2AB        2021-06-22     2021-12-01    

The desired result of the above would be:

identifier start_date     end_date
4EF        2021-07-29     2021-08-09
4EF        2021-10-07     2021-11-12    
2AB        2021-01-01     2021-06-20    
2AB        2021-06-22     2021-12-01 

What would be the most efficient way to do this?

CodePudding user response:

This is fairly tricky problem, because it is a gaps and islands problem, but your initial data set is far removed from having clear values for the gaps or islands. Here is one approach:

WITH cte AS (
    SELECT *, CASE WHEN start_date - LAG(end_date) OVER (PARTITION BY identifier
                                                         ORDER BY start_date) = 1
                   THEN 1 ELSE 0 END AS label
    FROM yourTable
),
cte2 AS (
    SELECT *, SUM(CASE WHEN label = 0 THEN 1 ELSE 0 END)
                  OVER (PARTITION BY identifier ORDER BY start_date) AS grp
    FROM cte
)

SELECT
    identifier,
    MIN(start_date) AS start_date,
    MAX(end_date) AS end_date
FROM cte2
GROUP BY
    identifier,
    grp
ORDER BY
    identifier,
    MIN(start_date);

Demo

The logic of the first CTE is to assign a 0 or 1 depending on whether the previous record for a given identifier were part of the same continuous date range or not. Once we have these 0s and 1s assigned, then we can take a rolling sum within each identifier to come up with a group number. The point here is that we generate a pseudo group for each identifier and continuous dates.

  • Related