I have a dataframe defined as follows. I'd like to count the number of days (or rows) when the input
column changes from 1 to 0:
import pandas as pd
df = pd.DataFrame({'input': [1,1,1,0,0,0,1,1,1,0,0,0]},
index=pd.date_range('2021-10-01', periods=12))
# I can mark the points of interest, i.e. when it goes from 1 to 0
df['change'] = 0
df.loc[(df['input'].shift(1) - df['input']) > 0, 'change'] = 1
print(df)
I end up with the following:
input change
2021-10-01 1 0
2021-10-02 1 0
2021-10-03 1 0
2021-10-04 0 1
2021-10-05 0 0
2021-10-06 0 0
2021-10-07 1 0
2021-10-08 1 0
2021-10-09 1 0
2021-10-10 0 1
2021-10-11 0 0
2021-10-12 0 0
What I want is a res
output:
input change res
2021-10-01 1 0 0
2021-10-02 1 0 0
2021-10-03 1 0 0
2021-10-04 0 1 1
2021-10-05 0 0 2
2021-10-06 0 0 3
2021-10-07 1 0 0
2021-10-08 1 0 0
2021-10-09 1 0 0
2021-10-10 0 1 1
2021-10-11 0 0 2
2021-10-12 0 0 3
I know I can use a cumsum
but don't find a way to "reset it" at the appropriate points:
df['res'] = (1 - df['input']).cumsum()*(1 - df['input'])
but this above will continue accumulating and not reset where change == 1
CodePudding user response:
We can create a boolean Series only where input
eq
0
then group by consecutive values and take the groupby cumsum
of the boolean Series. This is essentially enumerating groups, but only groups where there are 0s in input
.
0
:
m = df['input'].eq(0)
df['res'] = m.groupby(m.ne(m.shift()).cumsum()).cumsum()
df
:
input change res
2021-10-01 1 0 0
2021-10-02 1 0 0
2021-10-03 1 0 0
2021-10-04 0 1 1
2021-10-05 0 0 2
2021-10-06 0 0 3
2021-10-07 1 0 0
2021-10-08 1 0 0
2021-10-09 1 0 0
2021-10-10 0 1 1
2021-10-11 0 0 2
2021-10-12 0 0 3