Home > other >  How to reset the cumulative sum once the sequence is no longer repeated - pandas
How to reset the cumulative sum once the sequence is no longer repeated - pandas

Time:09-05

I have a dataframe as below where I grouped by A and B and performed a cumsum and assign it as column C. But the issue is, whenever I do the cumsum, the value is not getting resetted.

A   B   Cumsum
1   2   0
1   2   1
1   2   2
1   2   3 
2   3   3
1   2   4

By right , the last row will have to reset to 0. Please let me know how I can do this in pandas

CodePudding user response:

You need to create another group variable which detects if any value in A or B has changed:

df['cumcount'] = df[['A', 'B']].diff().ne(0).any(axis=1).cumsum().pipe(
    lambda s: s.groupby(s).cumcount()
)

df
   A  B  Cumsum  cumcount
0  1  2       0         0
1  1  2       1         1
2  1  2       2         2
3  1  2       3         3
4  2  3       3         0
5  1  2       4         0

To handle types other than numeric, you can use shift and explicit comparison for diff:

(df[['A', 'B']] != df[['A', 'B']].shift()).ne(0).any(1).cumsum().pipe(
    lambda s: s.groupby(s).cumcount()
)
0    0
1    1
2    2
3    3
4    0
5    0
dtype: int64

CodePudding user response:

We can combine them into tuple

s = df[['A','B']].apply(tuple,1)
df['new'] = s.groupby(s.ne(s.shift()).cumsum()).cumcount()
Out[100]: 
0    0
1    1
2    2
3    3
4    0
5    0
dtype: int64
  • Related