Count the number of rows prior to the most recent instance of a row matching a condition-CodePudding

Given the following dataframe:

id_	col
1	0
2	1
3	0
4	0
5	1
6	0
7	0

I'm looking to create a column that counts the number of rows since the most recent instance of col == 1. The final dataframe should look like:

id_	col	count
1	0
2	1
3	0	1
4	0	2
5	1	3
6	0	1
7	0	2

I can build a simple looping function but I'm sure there's a clever vectorised way to do this. Any ideas would be most welcome!

Here's the code to create the dataframe:

from pandas import DataFrame

DataFrame(
    {
        'id_': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7},
        'col': {0: 0, 1: 1, 2: 0, 3: 0, 4: 1, 5: 0, 6: 0},
    }
)

CodePudding user response：

here is one way to do it


# Mask out the zero values, leaving 1 in place
# then count number of 1's using cumsum, this gives number of time value being 1
# shift the result, so we count row after value is changed

df['count']=df['col'].mask(df['col'].eq(0)).cumsum().ffill().shift(1)


# count value within each of the groups formed above
# this return the expected result

df['count']=df[df['count'].notna()].groupby('count')['count'].transform('cumcount') 1
df

    id_     col     count
0     1       0     NaN
1     2       1     NaN
2     3       0     1.0
3     4       0     2.0
4     5        1    3.0
5     6       0     1.0
6     7       0     2.0