When the start
column is true, start counting.
When the end
column is true, stop counting.
Input:
import pandas as pd
df=pd.DataFrame()
df['start']=[False,True,False,False,False,True,False,False,False]
df['end']= [False,False,False,True,False,False,False,True,False]
Expected Output:
start end expected
0 False False 0
1 True False 1
2 False False 2
3 False True 0
4 False False 0
5 True False 1
6 False False 2
7 False True 0
8 False False 0
CodePudding user response:
You can use cumsum
to compute the groups, groupby.cummax
to identify the values after a start (and later mask with where
) and groupby.cumcount
to increment a counter:
# make groups between start/end
group = (df['start']|df['end']).cumsum()
# identify values after a start and before an end
mask = df['start'].groupby(group).cummax()
# compute a cumcount and mask with the above "mask"
df['expected'] = df.groupby(group).cumcount().add(1).where(mask, 0)
Output:
start end expected
0 False False 0
1 True False 1
2 False False 2
3 False True 0
4 False False 0
5 True False 1
6 False False 2
7 False True 0
8 False False 0