Assuming i have the following data frame
date | flag | user | num | |
---|---|---|---|---|
0 | 2019-01-01 | 1 | a | 10 |
1 | 2019-01-02 | 0 | a | 20 |
2 | 2019-01-03 | 1 | b | 30 |
3 | 2019-03-04 | 1 | b | 40 |
I want to create a cumulative sum of the nums grouped by user only if flag == 1 so i will get this:
date | flag | user | num | cumsum | |
---|---|---|---|---|---|
0 | 2019-01-01 | 1 | a | 10 | 10 |
1 | 2019-01-02 | 0 | a | 20 | 10 |
2 | 2019-01-03 | 1 | b | 30 | 30 |
3 | 2019-03-04 | 1 | b | 40 | 70 |
So far i was able to cumsum by flag, disregarding the group by user
df['cumsum'] = df[df['flag'] == 1 ]['num'].transform(pd.Series.cumsum)
or cumsum by user disregarding the flag
df['cumsum'] = df.groupby('user')['num'].transform(pd.Series.cumsum)
I need help making them work together.
CodePudding user response:
You could multiply num
by flag
to make num = 0
where flag = 0
, group by user
, and cumsum
:
df['cumsum'] = df['num'].mul(df['flag']).groupby(df['user']).cumsum()
Output:
>>> df
date flag user num cumsum
0 2019-01-01 1 a 10 10
1 2019-01-02 0 a 20 10
2 2019-01-03 1 b 30 30
3 2019-03-04 1 b 40 70
CodePudding user response:
With series.where
to mark num
==0 where flag is 0 then groupby cumsum:
df['cumsum'] = df['num'].where(df['flag'].eq(1),0).groupby(df["user"]).cumsum()
date flag user num cumsum
0 2019-01-01 1 a 10 10
1 2019-01-02 0 a 20 10
2 2019-01-03 1 b 30 30
3 2019-03-04 1 b 40 70