Home > Back-end >  Pandas group by cumsum with a flag condition
Pandas group by cumsum with a flag condition

Time:12-16

Assuming i have the following data frame

date flag user num
0 2019-01-01 1 a 10
1 2019-01-02 0 a 20
2 2019-01-03 1 b 30
3 2019-03-04 1 b 40

I want to create a cumulative sum of the nums grouped by user only if flag == 1 so i will get this:

date flag user num cumsum
0 2019-01-01 1 a 10 10
1 2019-01-02 0 a 20 10
2 2019-01-03 1 b 30 30
3 2019-03-04 1 b 40 70

So far i was able to cumsum by flag, disregarding the group by user

df['cumsum'] = df[df['flag'] == 1 ]['num'].transform(pd.Series.cumsum)

or cumsum by user disregarding the flag

df['cumsum'] = df.groupby('user')['num'].transform(pd.Series.cumsum)

I need help making them work together.

CodePudding user response:

You could multiply num by flag to make num = 0 where flag = 0, group by user, and cumsum:

df['cumsum'] = df['num'].mul(df['flag']).groupby(df['user']).cumsum()

Output:

>>> df
         date  flag user  num  cumsum
0  2019-01-01     1    a   10      10
1  2019-01-02     0    a   20      10
2  2019-01-03     1    b   30      30
3  2019-03-04     1    b   40      70

CodePudding user response:

With series.where to mark num==0 where flag is 0 then groupby cumsum:

df['cumsum'] =  df['num'].where(df['flag'].eq(1),0).groupby(df["user"]).cumsum()

         date  flag user  num  cumsum
0  2019-01-01     1    a   10      10
1  2019-01-02     0    a   20      10
2  2019-01-03     1    b   30      30
3  2019-03-04     1    b   40      70
  • Related