Home > Software design >  Reset pandas cumsum when the condition is not satisified
Reset pandas cumsum when the condition is not satisified

Time:10-06

I went through different stackoverflow questions and finally posting it because I couldnt solve one of the issues I am facing. I have a dataframe like below

A        B          C
group1   group1_c   12
group1   group1_c   12
group1   group1_c   12
group1   group1_c   1
group1   group1_c   12
group1   group1_c   12

I have to match two rows together and whenever the value matches, I cumsum it. To do this,

df['cumul'] = df['C'].eq(df.groupby(['A','B'])['C'].shift(1).ffill()).groupby([df['A'],df['B']).cumsum()

Once I do this,

A        B          C    Cumul
group1   group1_c   12   0
group1   group1_c   12   1
group1   group1_c   12   2
group1   group1_c   1    2
group1   group1_c   12   3 
group1   group1_c   12   3

Whereas I want to reset if the condition is not met.Expected solution

A        B          C    Cumul
group1   group1_c   12   0
group1   group1_c   12   1
group1   group1_c   12   2
group1   group1_c   1    0
group1   group1_c   12   0 
group1   group1_c   12   1

Please advice Thank you

CodePudding user response:

If need count groups per consecutive values of C column use Series.ne with Series.shift and cumulative sum, last use counter by GroupBy.cumcount:

df['cumul'] = df.groupby(df['C'].ne(df['C'].shift()).cumsum()).cumcount()

print (df)
        A         B   C  cumul
0  group1  group1_c  12      0
1  group1  group1_c  12      1
2  group1  group1_c  12      2
3  group1  group1_c   1      0
4  group1  group1_c  12      0
5  group1  group1_c  12      1

If need per A, B groups also add both groups:

print (df)
        A         B   C
0  group1  group1_c  12
1  group1  group2_c  12 <-changed groups
2  group1  group2_c  12 <-changed groups
3  group1  group1_c   1
4  group1  group1_c  12
5  group1  group1_c  12

s = df['C'].ne(df['C'].shift()).cumsum()
df['cumul'] = df.groupby([df['A'],df['B'], s]).cumcount()

df['cumul1'] = df.groupby(df['C'].ne(df['C'].shift()).cumsum()).cumcount()
print (df)
        A         B   C  cumul  cumul1
0  group1  group1_c  12      0       0
1  group1  group2_c  12      0       1
2  group1  group2_c  12      1       2
3  group1  group1_c   1      0       0
4  group1  group1_c  12      0       0
5  group1  group1_c  12      1       1

Alternative solution:

s = df[['A','B','C']].ne(df[['A','B','C']].shift()).any(axis=1).cumsum()
df['cumul'] = df.groupby(s).cumcount()
  • Related