I have a dataframe as below where I grouped by A and B and performed a cumsum and assign it as column C. But the issue is, whenever I do the cumsum, the value is not getting resetted.
A B Cumsum
1 2 0
1 2 1
1 2 2
1 2 3
2 3 3
1 2 4
By right , the last row will have to reset to 0. Please let me know how I can do this in pandas
CodePudding user response:
You need to create another group variable which detects if any value in A
or B
has changed:
df['cumcount'] = df[['A', 'B']].diff().ne(0).any(axis=1).cumsum().pipe(
lambda s: s.groupby(s).cumcount()
)
df
A B Cumsum cumcount
0 1 2 0 0
1 1 2 1 1
2 1 2 2 2
3 1 2 3 3
4 2 3 3 0
5 1 2 4 0
To handle types other than numeric, you can use shift and explicit comparison for diff
:
(df[['A', 'B']] != df[['A', 'B']].shift()).ne(0).any(1).cumsum().pipe(
lambda s: s.groupby(s).cumcount()
)
0 0
1 1
2 2
3 3
4 0
5 0
dtype: int64
CodePudding user response:
We can combine them into tuple
s = df[['A','B']].apply(tuple,1)
df['new'] = s.groupby(s.ne(s.shift()).cumsum()).cumcount()
Out[100]:
0 0
1 1
2 2
3 3
4 0
5 0
dtype: int64