I have a dataframe defined as follows. I'd like to count the number of days (or rows) when the input
column changes from 1 to 0 but without zeros:
import pandas as pd
df = pd.DataFrame({'input': [1,1,1,0,0,0,1,1,1,0,0,0]},
index=pd.date_range('2021-10-01', periods=12))
# I can mark the points of interest, i.e. when it goes from 1 to 0
df['change'] = 0
df.loc[(df['input'].shift(1) - df['input']) > 0, 'change'] = 1
print(df)
I end up with the following:
input change
2021-10-01 1 0
2021-10-02 1 0
2021-10-03 1 0
2021-10-04 0 1
2021-10-05 0 0
2021-10-06 0 0
2021-10-07 1 0
2021-10-08 1 0
2021-10-09 1 0
2021-10-10 0 1
2021-10-11 0 0
2021-10-12 0 0
What I want is a res
output, so every time I get a one restart a count:
input change res
2021-10-01 1 0 0
2021-10-02 1 0 0
2021-10-03 1 0 0
2021-10-04 0 1 1
2021-10-05 0 0 2
2021-10-06 0 0 3
2021-10-07 1 0 4
2021-10-08 1 0 5
2021-10-09 1 0 6
2021-10-10 0 1 1
2021-10-11 0 0 2
2021-10-12 0 0 3
Note that it is very similar to question How to count the number of days since a column flag? but without having zeros in between cases.
CodePudding user response:
You can use groupby
to generate groups restarting at each 1, then cumcount
:
s = df['change'].cumsum()
df['res'] = s.groupby(s).cumcount().add(1).mask(s.eq(0), 0)
output:
input change res
2021-10-01 1 0 0
2021-10-02 1 0 0
2021-10-03 1 0 0
2021-10-04 0 1 1
2021-10-05 0 0 2
2021-10-06 0 0 3
2021-10-07 1 0 4
2021-10-08 1 0 5
2021-10-09 1 0 6
2021-10-10 0 1 1
2021-10-11 0 0 2
2021-10-12 0 0 3