I have a dataframe with multiple columns. One of these columns consists of boolean numbers. For example:
data = pd.DataFrame([0,0,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,1,1,1,1,0,0])
What I need to do is identify every group of 1s and add a constant number, except the first group of 1s. The output should be a dataframe as follows:
0,0,0,0,1,1,1,0,0,0,0,0,2,2,0,0,0,0,3,3,3,3,0,0
Is there a way to make this without being messy and complicated?
CodePudding user response:
Use a boolean mask:
# Look for current row = 1 and previous row = 0
m = df['A'].diff().eq(1)
df['G'] = m.cumsum().mask(df['A'].eq(0), 0)
print(df)
# Output
A G # m
0 0 0 # False
1 0 0 # False
2 0 0 # False
3 0 0 # False
4 1 1 # True <- Group 1
5 1 1 # False
6 1 1 # False
7 0 0 # False
8 0 0 # False
9 0 0 # False
10 0 0 # False
11 0 0 # False
12 1 2 # True <- Group 2
13 1 2 # False
14 0 0 # False
15 0 0 # False
16 0 0 # False
17 0 0 # False
18 1 3 # True <- Group 3
19 1 3 # False
20 1 3 # False
21 1 3 # False
22 0 0 # False
23 0 0 # False