Say I have the following dataframe
holder
0
1
2
0
1
2
0
1
0
1
2
I want to be able to group each set of numbers that come in starting at 0, ends at the max value, assign it a value for that group. So
holder group
0 1
1 1
2 1
0 2
1 2
2 2
0 3
1 3
0 4
1 4
2 4
I tried: n=3
df['group'] = [int(i/n) for i,x in enumerate(df.holder)]
But this returns
holder group
0 1
1 1
2 1
0 2
1 2
2 2
0 3
1 3
0 3
1 4
2 4
CodePudding user response:
Assuming holder is monotonically nondecreasing until another 0 occurs, you can identify the zeroes and create groups by taking the cumulative sum.
df = pd.DataFrame({'holder': [0, 1, 2, 0, 1, 2, 0, 1, 0, 1, 2]})
# identify 0s and create groups
df['group'] = df['holder'].eq(0).cumsum()
print(df)
holder group
0 0 1
1 1 1
2 2 1
3 0 2
4 1 2
5 2 2
6 0 3
7 1 3
8 0 4
9 1 4
10 2 4
CodePudding user response:
Assuming each group's numbers always increase, you can check whether the numbers are less than or equal to the ones before, then take the cumulative sum, which turns the booleans into group numbers.
df['group'] = df['holder'].diff().le(0).cumsum() 1
Result:
holder group
0 0 1
1 1 1
2 2 1
3 0 2
4 1 2
5 2 2
6 0 3
7 1 3
8 0 4
9 1 4
10 2 4
(I'm using <=
specifically instead of <
in case of two adjacent 0s.)
This was inspired by Nickil Maveli's answer on "Groupby conditional sum of adjacent rows" but the cleaner method was posted by d.b in a comment here.