How does the code look like to partition the following table. date and status are given, partition column shall be added. Column group is only to explain where the group starts and ends. Finally, I like to do some analytics, e.g. how long takes the process per group.
In words but don't know to convert to code: status 'approved' always defines the end. Only an 'open' after 'approval' defines the start. The other 'open' are not relevant.
date | status | Group | Partition |
---|---|---|---|
1.10.2022 | open | Group 1 Starts | 1 |
2.10.2022 | waiting | 1 | |
3.10.2022 | open | 1 | |
4.10.2022 | waiting | 1 | |
5.10.2022 | approved | Group 1 Ends | 1 |
7.10.2022 | open | Group 2 Start | 2 |
8.10.2022 | waiting | 2 | |
9.10.2022 | open | 2 | |
10.10.2022 | waiting | 2 | |
11.10.2022 | open | 2 | |
12.10.2022 | waiting | 2 | |
15.10.2022 | approved | Group 2 Ends | 2 |
17.10.2022 | open | Group 3 Starts | 3 |
20.10.2022 | waiting | 3 |
Thanks for the solution. Works fine :-) And sorry for not using the right expression. If Group is better than Partition even better...
Can we make it slightly more complicated?
This patter in the table applis to several parent records. So in reality there is an additional column Parent ID. This table below is then for example for parent ID A. There are many more parents.
How can an additional grouping be added by Parent ID? At eeach new parent the counting starts again at 1
CodePudding user response:
Assuming you have the first two columns and want to derive the last two, treat this as a gaps-and-islands problem:
with groups as ( -- Assign partitions
select *,
coalesce(
sum(case when status = 'approved' then 1 else 0 end)
over (order by date rows between unbounded preceding
and 1 preceding),
0
) 1 as partition
from do_part
)
select date, status,
case -- Construct text descriptions
when partition != coalesce(lead(partition) over w, partition)
then format('Group %s Ends', partition)
when partition = lag(partition) over w
then ''
else format('Group %s Starts', partition)
end as "group",
partition
from groups
window w as (order by date);
CodePudding user response:
demo based on (Mike Organek)'s fiddle.
idea: left join with distinct on can properly cut the group.
SELECT DISTINCT ON (date,status)
date,
status,
coalesce(date_d, CURRENT_DATE) AS date_end
FROM
do_part t
LEFT JOIN (
SELECT
date AS date_d
FROM
do_part
WHERE
status = 'approved'
ORDER BY
date) s ON s.date_d >= t.date
ORDER BY
date,status,
date_d;
Final query (can be simplified):
WITH cte AS (
SELECT DISTINCT ON (date,
status)
date,
status,
coalesce(date_d, CURRENT_DATE) AS date_end
FROM
do_part t
LEFT JOIN (
SELECT
date AS date_d
FROM
do_part
WHERE
status = 'approved'
ORDER BY
date) s ON s.date_d >= t.date
ORDER BY
date,
status,
date_d
),
cte1 AS (
SELECT
*,
date_end - first_value(date) OVER (PARTITION BY date_end ORDER BY date) AS date_gap,
dense_rank() OVER (ORDER BY date_end),
CASE WHEN (date = first_value(date) OVER (PARTITION BY date_end ORDER BY date)) THEN
'group begin'
WHEN (status = 'approved') THEN
'group end '
ELSE
NULL
END AS grp
FROM
cte
)
SELECT
*,
CASE WHEN grp IS NOT NULL THEN
grp || dense_rank::text
END
FROM
cte1;