Home > Software design >  How to assign a group based on consecutive sum in pandas
How to assign a group based on consecutive sum in pandas

Time:10-07

I have a dataframe as below:

A         B          code cumul_sum
group1    group1_1    A     1
group1    group1_1    A     2
group1    group1_1    B     1
group1    group1_1    A     1
group1    group1_1    A     2
group1    group1_1    A     3 
group2    group2_1    A     1
group2    group2_1    A     2
group2    group2_1    A     3 

Please assume column A and B are belonging to same category. I wanted to group and the expected result are as follows

A         B          code cumul_sum   **groupcat**
group1    group1_1    A     1          group1
group1    group1_1    A     2          group1
group1    group1_1    B     1          group2
group1    group1_1    A     1          group3
group1    group1_1    A     2          group3
group1    group1_1    A     3          group3
group2    group2_1    A     1          group1
group2    group2_1    A     2          group1
group2    group2_1    A     3          group1

Since there is a code B that occurred in the middle, group1 is not assigned to the third row and then the next value is a fresh categorical value.

EDIT: To also reset the group-cat for group A and B.

Please advice

CodePudding user response:

You can modify previous solution:

#tested consecutive values by code column
s = df['code'].ne(df['code'].shift()).cumsum()
df['groupcat'] = 'group'   s.astype(str)

print (df)
        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3

#tested consecutive groups by A, B, code
s = df[['A','B','code']].ne(df[['A','B','code']].shift()).any(axis=1).cumsum()
df['groupcat'] = 'group'   s.astype(str)

print (df)
        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3

EDIT:

s = df['code'].ne(df['code'].shift()).cumsum()


df['group'] = ('group'   df.assign(s=s)
                           .groupby(['A','B'])['s']
                           .rank('dense').astype(int).astype(str))
print (df)
        A         B code  cumul_sum   group
0  group1  group1_1    A          1  group1
1  group1  group1_1    A          2  group1
2  group1  group1_1    B          1  group2
3  group1  group1_1    A          1  group3
4  group1  group1_1    A          2  group3
5  group1  group1_1    A          3  group3
6  group2  group2_1    A          1  group1
7  group2  group2_1    A          2  group1
8  group2  group2_1    A          3  group1

Alternative solution:

s = (df[['A','B','code']].ne(df[['A','B','code']].shift()).any(axis=1)
        .groupby([df.A, df.B]).cumsum())
df['groupcat'] = 'group'   s.astype(str)

print (df)
        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3
6  group2  group2_1    A          1   group1
7  group2  group2_1    A          2   group1
8  group2  group2_1    A          3   group1

CodePudding user response:

Since your groups ignore the code, you can just restart on each 1 value:

df['groupcat'] = df['cumul_sum'].eq(1).cumsum()

As string:

df['groupcat'] = df['cumul_sum'].eq(1).cumsum().astype(str).radd('group')

output:

        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3
  • Related