Home > OS >  Count similar names block per independent partitions
Count similar names block per independent partitions

Time:11-30

I have a dataframe that looks like this:

id   name     datetime
44   once     2022-11-22T15:41:00
44   once     2022-11-22T15:42:00
44   once     2022-11-22T15:43:00     
44   twice    2022-11-22T15:44:00
44   once     2022-11-22T16:41:00
55   thrice   2022-11-22T17:44:00
55   thrice   2022-11-22T17:46:00
55   once     2022-11-22T17:47:00
55   once     2022-11-22T17:51:00
55   twice    2022-11-22T18:41:00
55   thrice   2022-11-22T18:51:00

My desired output is

id   name     datetime                cnt
44   once     2022-11-22T15:41:00     3
44   once     2022-11-22T15:42:00     3
44   once     2022-11-22T15:43:00     3
44   twice    2022-11-22T15:44:00     1
44   once     2022-11-22T16:41:00     1
55   thrice   2022-11-22T17:44:00     2
55   thrice   2022-11-22T17:46:00     2    
55   once     2022-11-22T17:47:00     2
55   once     2022-11-22T17:51:00     2
55   twice    2022-11-22T18:41:00     1
55   thrice   2022-11-22T18:51:00     1

where the new column, cnt, is the maximum count of the name column per block that they follow themselves consecutively.

I attempted the problem by doing:

select
id,
name,
datetime,
row_number() over (partition by id order by datetime) rn1,
row_number() over (partition by id, name order by name, datetime) rn2
from table

but it is obviously not giving the desired output.

I tried also looking at the solutions in SQL count consecutive days but could not figure out from answers given there.

CodePudding user response:

As noted in the question you linked to, this is a typical gaps & islands problem.

The solution is provided in the answers to that question, but I've applied to your sample data specifically for you here:

with gp as (
  select *,
    Row_Number() over(partition by id order by [datetime]) 
    - Row_Number() over(partition by id, name order by [datetime]) g
  from t
)
select id, name, [datetime],
  Count(*) over(partition by id, name, g) cnt
from gp;

See Demo DBFiddle

  • Related