Postgres - partitioning a table-CodePudding

How does the code look like to partition the following table. date and status are given, partition column shall be added. Column group is only to explain where the group starts and ends. Finally, I like to do some analytics, e.g. how long takes the process per group.

In words but don't know to convert to code: status 'approved' always defines the end. Only an 'open' after 'approval' defines the start. The other 'open' are not relevant.

date	status	Group	Partition
1.10.2022	open	Group 1 Starts	1
2.10.2022	waiting		1
3.10.2022	open		1
4.10.2022	waiting		1
5.10.2022	approved	Group 1 Ends	1
7.10.2022	open	Group 2 Start	2
8.10.2022	waiting		2
9.10.2022	open		2
10.10.2022	waiting		2
11.10.2022	open		2
12.10.2022	waiting		2
15.10.2022	approved	Group 2 Ends	2
17.10.2022	open	Group 3 Starts	3
20.10.2022	waiting		3

Thanks for the solution. Works fine :-) And sorry for not using the right expression. If Group is better than Partition even better...

Can we make it slightly more complicated?

This patter in the table applis to several parent records. So in reality there is an additional column Parent ID. This table below is then for example for parent ID A. There are many more parents.

How can an additional grouping be added by Parent ID? At eeach new parent the counting starts again at 1

CodePudding user response：

Assuming you have the first two columns and want to derive the last two, treat this as a gaps-and-islands problem:

with groups as (  -- Assign partitions
  select *, 
         coalesce(
           sum(case when status = 'approved' then 1 else 0 end) 
             over (order by date rows between unbounded preceding 
                                          and 1 preceding),
           0
         )   1 as partition
    from do_part
)
select date, status, 
       case   -- Construct text descriptions
         when partition != coalesce(lead(partition) over w, partition) 
           then format('Group %s Ends', partition)
         when partition = lag(partition) over w
           then '' 
         else format('Group %s Starts', partition)
       end as "group",
       partition
  from groups
window w as (order by date);

Fiddle here

CodePudding user response：

demo based on (Mike Organek)'s fiddle.
idea: left join with distinct on can properly cut the group.

SELECT DISTINCT ON (date,status)
    date,
    status,
    coalesce(date_d, CURRENT_DATE) AS date_end
FROM
    do_part t
    LEFT JOIN (
        SELECT
            date AS date_d
        FROM
            do_part
        WHERE
            status = 'approved'
        ORDER BY
            date) s ON s.date_d >= t.date
ORDER BY
    date,status,
    date_d;

Final query (can be simplified):

   WITH cte AS (
    SELECT DISTINCT ON (date,
        status)
        date,
        status,
        coalesce(date_d, CURRENT_DATE) AS date_end
    FROM
        do_part t
        LEFT JOIN (
            SELECT
                date AS date_d
            FROM
                do_part
            WHERE
                status = 'approved'
            ORDER BY
                date) s ON s.date_d >= t.date
        ORDER BY
            date,
            status,
            date_d
),
cte1 AS (
    SELECT
        *,
        date_end - first_value(date) OVER (PARTITION BY date_end ORDER BY date) AS date_gap,
    dense_rank() OVER (ORDER BY date_end),
    CASE WHEN (date = first_value(date) OVER (PARTITION BY date_end ORDER BY date)) THEN
        'group begin'
    WHEN (status = 'approved') THEN
        'group end '
    ELSE
        NULL
    END AS grp
FROM
    cte
)
SELECT
    *,
    CASE WHEN grp IS NOT NULL THEN
        grp || dense_rank::text
    END
FROM
    cte1;