I have df:
id color val
1 y 3
1 y 3
1 y 3
2 y 1
2 r 2
I want to count the condition that the value in val column is <=3 at least 3 times in the same id and color in a row(in sequence), and when the condition is true, to return the id and color. for example I will get here 1,y thanks
CodePudding user response:
You can use:
N = 3 # number consecutive matches
s = (df['val'].le(3) # condition: value ≤ 3
.groupby([df['id'], df['color']]) # make groups
# is there any occurrence where there are N True in a row?
.apply(lambda s: s.rolling(N).sum().eq(N).any())
)
# keep only the True and convert to list
s[s].index.to_list()
output: [(1, 'y')]
intermediate s
:
id color
1 y True
2 r False
y False
Name: val, dtype: bool
CodePudding user response:
You can try groupby
and filter
cols = ['id', 'color']
out = (df.groupby(cols)
.filter(lambda df: df['val'].le(3).sum() >= 3)
.drop_duplicates(cols)[cols])
print(out)
id color
0 1 y