Home > Back-end >  How to add a number to a group of rows in a column only when the rows are grouped and have the same
How to add a number to a group of rows in a column only when the rows are grouped and have the same

Time:02-17

I have a dataframe with multiple columns. One of these columns consists of boolean numbers. For example:

data = pd.DataFrame([0,0,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,1,1,1,1,0,0])

What I need to do is identify every group of 1s and add a constant number, except the first group of 1s. The output should be a dataframe as follows:

0,0,0,0,1,1,1,0,0,0,0,0,2,2,0,0,0,0,3,3,3,3,0,0

Is there a way to make this without being messy and complicated?

CodePudding user response:

Use a boolean mask:

# Look for current row = 1 and previous row = 0
m = df['A'].diff().eq(1)

df['G'] = m.cumsum().mask(df['A'].eq(0), 0)
print(df)

# Output
    A  G  # m
0   0  0  # False
1   0  0  # False
2   0  0  # False
3   0  0  # False
4   1  1  # True  <- Group 1
5   1  1  # False
6   1  1  # False
7   0  0  # False
8   0  0  # False
9   0  0  # False
10  0  0  # False
11  0  0  # False
12  1  2  # True  <- Group 2
13  1  2  # False
14  0  0  # False
15  0  0  # False
16  0  0  # False
17  0  0  # False
18  1  3  # True  <- Group 3
19  1  3  # False
20  1  3  # False
21  1  3  # False
22  0  0  # False
23  0  0  # False
  • Related