Find 5 consecutive row values in Pandas Dataframe that are equal-CodePudding

I have a dataframe of the format

Col1 Col2 Col3
Time1  a  Match
Time2  b  NaN
Time3  c  Match
Time4  d  Match
Time5  e  Match
Time6  f  Match
Time7  g  Match
Time8  h  NaN
Time9  i  Match
Time10 j  NaN

So, for the data frame I would like to find the values from Time3 to Time7 as the Col3 is having Match Values for 5 consecutive rows.

I tried using rolling window,

idx = (df['Col3'].rolling(window=5) == "Match")

But according to the docs, it seems like it is made for calculations rather than Direct comparison

Is there any other optimized way to go about it?

CodePudding user response：

One classical way is to use boolean indexing with a custom mask. Breaking it down, it relies on making groups of consecutive Match values, and counting the group size to slice the matching rows.

m = df.groupby(df['Col3'].ne('Match').cumsum())['Col3'].transform('size').ge(5)
df[m&m.shift()]

Alternatively:

m = df['Col3'].ne('Match')
m2 = df.groupby((m|m.shift()).cumsum())['Col3'].transform('size').ge(5)
df[m2]

output:

    Col1 Col2   Col3
2  Time3    c  Match
3  Time4    d  Match
4  Time5    e  Match
5  Time6    f  Match
6  Time7    g  Match