Home > Software design >  Find 5 consecutive row values in Pandas Dataframe that are equal
Find 5 consecutive row values in Pandas Dataframe that are equal

Time:03-04

I have a dataframe of the format

Col1 Col2 Col3
Time1  a  Match
Time2  b  NaN
Time3  c  Match
Time4  d  Match
Time5  e  Match
Time6  f  Match
Time7  g  Match
Time8  h  NaN
Time9  i  Match
Time10 j  NaN

So, for the data frame I would like to find the values from Time3 to Time7 as the Col3 is having Match Values for 5 consecutive rows.

I tried using rolling window,

idx = (df['Col3'].rolling(window=5) == "Match")

But according to the docs, it seems like it is made for calculations rather than Direct comparison

Is there any other optimized way to go about it?

CodePudding user response:

One classical way is to use boolean indexing with a custom mask. Breaking it down, it relies on making groups of consecutive Match values, and counting the group size to slice the matching rows.

m = df.groupby(df['Col3'].ne('Match').cumsum())['Col3'].transform('size').ge(5)
df[m&m.shift()]

Alternatively:

m = df['Col3'].ne('Match')
m2 = df.groupby((m|m.shift()).cumsum())['Col3'].transform('size').ge(5)
df[m2]

output:

    Col1 Col2   Col3
2  Time3    c  Match
3  Time4    d  Match
4  Time5    e  Match
5  Time6    f  Match
6  Time7    g  Match
  • Related