Home > Mobile >  Count the number of rows prior to the most recent instance of a row matching a condition
Count the number of rows prior to the most recent instance of a row matching a condition

Time:10-16

Given the following dataframe:

id_ col
1 0
2 1
3 0
4 0
5 1
6 0
7 0

I'm looking to create a column that counts the number of rows since the most recent instance of col == 1. The final dataframe should look like:

id_ col count
1 0
2 1
3 0 1
4 0 2
5 1 3
6 0 1
7 0 2

I can build a simple looping function but I'm sure there's a clever vectorised way to do this. Any ideas would be most welcome!

Here's the code to create the dataframe:

from pandas import DataFrame

DataFrame(
    {
        'id_': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7},
        'col': {0: 0, 1: 1, 2: 0, 3: 0, 4: 1, 5: 0, 6: 0},
    }
)

CodePudding user response:

here is one way to do it


# Mask out the zero values, leaving 1 in place
# then count number of 1's using cumsum, this gives number of time value being 1
# shift the result, so we count row after value is changed

df['count']=df['col'].mask(df['col'].eq(0)).cumsum().ffill().shift(1)


# count value within each of the groups formed above
# this return the expected result

df['count']=df[df['count'].notna()].groupby('count')['count'].transform('cumcount') 1
df


    id_     col     count
0     1       0     NaN
1     2       1     NaN
2     3       0     1.0
3     4       0     2.0
4     5        1    3.0
5     6       0     1.0
6     7       0     2.0
  • Related