Home > Enterprise >  Grouping a set of numbers that reoccur in a pandas DataFrame
Grouping a set of numbers that reoccur in a pandas DataFrame

Time:05-24

Say I have the following dataframe

holder
0
1
2
0
1
2
0
1
0
1
2

I want to be able to group each set of numbers that come in starting at 0, ends at the max value, assign it a value for that group. So

holder  group
0        1
1        1
2        1
0        2
1        2
2        2
0        3
1        3
0        4
1        4
2        4

I tried: n=3

df['group'] = [int(i/n) for i,x in enumerate(df.holder)]

But this returns

holder  group
0        1
1        1
2        1
0        2
1        2
2        2
0        3
1        3
0        3
1        4
2        4

CodePudding user response:

Assuming holder is monotonically nondecreasing until another 0 occurs, you can identify the zeroes and create groups by taking the cumulative sum.

df = pd.DataFrame({'holder': [0, 1, 2, 0, 1, 2, 0, 1, 0, 1, 2]})
# identify 0s and create groups
df['group'] = df['holder'].eq(0).cumsum()
print(df)
    holder  group
0        0      1
1        1      1
2        2      1
3        0      2
4        1      2
5        2      2
6        0      3
7        1      3
8        0      4
9        1      4
10       2      4

CodePudding user response:

Assuming each group's numbers always increase, you can check whether the numbers are less than or equal to the ones before, then take the cumulative sum, which turns the booleans into group numbers.

df['group'] = df['holder'].diff().le(0).cumsum()   1

Result:

    holder  group
0        0      1
1        1      1
2        2      1
3        0      2
4        1      2
5        2      2
6        0      3
7        1      3
8        0      4
9        1      4
10       2      4

(I'm using <= specifically instead of < in case of two adjacent 0s.)

This was inspired by Nickil Maveli's answer on "Groupby conditional sum of adjacent rows" but the cleaner method was posted by d.b in a comment here.

  • Related