Select beginning of latest consecutive groups for all values in PostgreSQL-CodePudding

I have the following table:

weekStart	bcid
2022-07-18	df9bd190-417d-4007-9dc7-1de0a5ba4045
2022-07-25	e46ac8ed-bcc5-4dab-9694-479637e1ee99
2022-08-01	e46ac8ed-bcc5-4dab-9694-479637e1ee99
2022-08-01	df9bd190-417d-4007-9dc7-1de0a5ba4045
2022-08-08	df9bd190-417d-4007-9dc7-1de0a5ba4045
2022-08-08	ca3fd5c6-73d0-4104-aa03-dcc0eb263c0a
2022-08-08	e46ac8ed-bcc5-4dab-9694-479637e1ee99

I need to find for every bcid the latest weekStart after which it appeared every "week" (i.e. group). That is, each value may appear consistently every week, then disappear and, at some point, come back. And I need the start date of this last (newest) group in which value started to appear consecutively again.

E.g. for df9bd190-417d-4007-9dc7-1de0a5ba4045 this should be 2022-08-01 and not 2022-07-18, because it was missing in 2022-07-25.

I currently think towards windowed functions, but not sure how to get to the "consecutively" part:

select bcid, min(min("weekStart")) over (partition by bcid) as "firstAppearedLatest"
from tx
group by bcid

Can assume that there are no weeks skipped between first and last weekStart values.

SQLFiddle

CodePudding user response：

First, mark all the groups using lag() to check to see if there is a break.

Next, give group numbers to the groupings (casting boolean to int gives 1 for true and 0 for false).

Use distinct on to grab the record to keep:

with mark_gaps as (
  select *,
         "weekStart" > lag("weekStart") over w   interval '7 days' as new_group
    from tx
  window w as (partition by bcid order by "weekStart")
), group_nums as (
  select *, 
         sum(coalesce(new_group, true)::int) over w as group_num
    from mark_gaps
  window w as (partition by bcid order by "weekStart")
)
select distinct on (bcid) bcid, "weekStart"
  from group_nums
 order by bcid, group_num desc, "weekStart";

Updated fiddle here