I have the following df:
df = pd.DataFrame({"val_a":[True,True,False,False,False,True,False,False,True,True,True,True,False,True,True]})
val_a
0 True
1 True
2 False
3 False
4 False
5 True
6 False
7 False
8 True
9 True
10 True
11 True
12 False
13 True
14 True
and I wish to have the following result:
val_a tx
0 True 0
1 True 0
2 False None
3 False None
4 False None
5 True 1
6 False None
7 False None
8 True 2
9 True 2
10 True 2
11 True 2
12 False None
13 True 3
14 True 3
explanation: When you see a True - count it as a group so for index 0 and 1 its the same tx (0) later comes only one True (index 5) so mark it as 1.
What have I tired: I know that cumsum and groupby must come into play here but couldnt figure how.
g = (df['val_a']==True).cumsum()
df['tx'] = df.groupby(g).ffill()
CodePudding user response:
Identify the groups with cumsum
then filter the rows having True
values and use factorize
to assign the ordinal number to each unique group
m = df['val_a']
df.loc[m, 'tx'] = (~m).cumsum()[m].factorize()[0]
Alternatively you can also use groupby
ngroup
m = df['val_a']
df['tx'] = m[m].groupby((~m).cumsum()).ngroup()
val_a tx
0 True 0.0
1 True 0.0
2 False NaN
3 False NaN
4 False NaN
5 True 1.0
6 False NaN
7 False NaN
8 True 2.0
9 True 2.0
10 True 2.0
11 True 2.0
12 False NaN
13 True 3.0
14 True 3.0