I have the following table:
weekStart | bcid |
---|---|
2022-07-18 | df9bd190-417d-4007-9dc7-1de0a5ba4045 |
2022-07-25 | e46ac8ed-bcc5-4dab-9694-479637e1ee99 |
2022-08-01 | e46ac8ed-bcc5-4dab-9694-479637e1ee99 |
2022-08-01 | df9bd190-417d-4007-9dc7-1de0a5ba4045 |
2022-08-08 | df9bd190-417d-4007-9dc7-1de0a5ba4045 |
2022-08-08 | ca3fd5c6-73d0-4104-aa03-dcc0eb263c0a |
2022-08-08 | e46ac8ed-bcc5-4dab-9694-479637e1ee99 |
I need to find for every bcid
the latest weekStart
after which it appeared every "week" (i.e. group). That is, each value may appear consistently every week, then disappear and, at some point, come back. And I need the start date of this last (newest) group in which value started to appear consecutively again.
E.g. for df9bd190-417d-4007-9dc7-1de0a5ba4045
this should be 2022-08-01 and not 2022-07-18, because it was missing in 2022-07-25.
I currently think towards windowed functions, but not sure how to get to the "consecutively" part:
select bcid, min(min("weekStart")) over (partition by bcid) as "firstAppearedLatest"
from tx
group by bcid
Can assume that there are no weeks skipped between first and last weekStart
values.
CodePudding user response:
First, mark all the groups using lag()
to check to see if there is a break.
Next, give group numbers to the groupings (casting boolean
to int
gives 1
for true
and 0
for false
).
Use distinct on
to grab the record to keep:
with mark_gaps as (
select *,
"weekStart" > lag("weekStart") over w interval '7 days' as new_group
from tx
window w as (partition by bcid order by "weekStart")
), group_nums as (
select *,
sum(coalesce(new_group, true)::int) over w as group_num
from mark_gaps
window w as (partition by bcid order by "weekStart")
)
select distinct on (bcid) bcid, "weekStart"
from group_nums
order by bcid, group_num desc, "weekStart";