I have a dataframe of the format
Col1 Col2 Col3
Time1 a Match
Time2 b NaN
Time3 c Match
Time4 d Match
Time5 e Match
Time6 f Match
Time7 g Match
Time8 h NaN
Time9 i Match
Time10 j NaN
So, for the data frame I would like to find the values from Time3 to Time7 as the Col3 is having Match Values for 5 consecutive rows.
I tried using rolling window,
idx = (df['Col3'].rolling(window=5) == "Match")
But according to the docs, it seems like it is made for calculations rather than Direct comparison
Is there any other optimized way to go about it?
CodePudding user response:
One classical way is to use boolean indexing with a custom mask. Breaking it down, it relies on making groups of consecutive Match values, and counting the group size to slice the matching rows.
m = df.groupby(df['Col3'].ne('Match').cumsum())['Col3'].transform('size').ge(5)
df[m&m.shift()]
Alternatively:
m = df['Col3'].ne('Match')
m2 = df.groupby((m|m.shift()).cumsum())['Col3'].transform('size').ge(5)
df[m2]
output:
Col1 Col2 Col3
2 Time3 c Match
3 Time4 d Match
4 Time5 e Match
5 Time6 f Match
6 Time7 g Match