count sequence of value in df-CodePudding

I have df:

id  color val
1     y    3
1     y    3
1     y    3
2     y    1
2     r    2

I want to count the condition that the value in val column is <=3 at least 3 times in the same id and color in a row(in sequence), and when the condition is true, to return the id and color. for example I will get here 1,y thanks

CodePudding user response：

You can use:

N = 3 # number consecutive matches
s = (df['val'].le(3) # condition: value ≤ 3
 .groupby([df['id'], df['color']]) # make groups
 # is there any occurrence where there are N True in a row?
 .apply(lambda s: s.rolling(N).sum().eq(N).any())
)

# keep only the True and convert to list
s[s].index.to_list()

output: [(1, 'y')]

intermediate s:

id  color
1   y         True
2   r        False
    y        False
Name: val, dtype: bool

CodePudding user response：

You can try groupby and filter

cols = ['id', 'color']
out = (df.groupby(cols)
       .filter(lambda df: df['val'].le(3).sum() >= 3)
       .drop_duplicates(cols)[cols])

print(out)

   id color
0   1     y