Home > Mobile >  Groupby Date and Class and Cumulative Sum - Pandas
Groupby Date and Class and Cumulative Sum - Pandas

Time:09-29

I have a dataframe like this:

Category  |  Date      |  Val1  |  Val2  |  Val3  |
----------|------------|--------|--------|--------|
  A       | 2022-01-01 |  50    |   70   |   80   |
  B       | 2022-01-01 |  30    |   50   |   20   |
  C       | 2022-01-01 |  40    |   60   |   90   |
  A       | 2022-01-05 |  10    |   40   |   20   |
  B       | 2022-01-05 |  40    |   75   |   80   |
  C       | 2022-01-05 |  60    |   45   |   65   |
  A       | 2022-01-10 |  50    |   40   |   35   |
  B       | 2022-01-10 |  50    |   50   |   25   |
  C       | 2022-01-10 |  50    |   95   |   85   |

What I want to achieve is groupby date and class and take the cumulative sum. So expected output would be like this:

Category  |  Date      |  Val1  |  Val2  |  Val3  |
----------|------------|--------|--------|--------|
  A       | 2022-01-01 |  50    |   70   |   80   |
  B       | 2022-01-01 |  30    |   50   |   20   |
  C       | 2022-01-01 |  40    |   60   |   90   |
  A       | 2022-01-05 |  60    |   110  |   100  |
  B       | 2022-01-05 |  70    |   125  |   100  |
  C       | 2022-01-05 |  100   |   105  |   155  |
  A       | 2022-01-10 |  110   |   150  |   135  |
  B       | 2022-01-10 |  120   |   175  |   125  |
  C       | 2022-01-10 |  150   |   200  |   240  |

All numerical columns are cumsum (Val1, Val2 & Val3)

I tried to take reference from this code but couldn't make it work in my case.

CodePudding user response:

If not groupby by date, only by Category ouput is:

cols = ['Val1','Val2','Val3']
df[cols] = df.groupby('Category')[cols].cumsum()
print (df)
  Category       Date  Val1  Val2  Val3
0        A 2022-01-01    50    70    80
1        B 2022-01-01    30    50    20
2        C 2022-01-01    40    60    90
3        A 2022-01-05    60   110   100
4        B 2022-01-05    70   125   100
5        C 2022-01-05   100   105   155
6        A 2022-01-10   110   150   135
7        B 2022-01-10   120   175   125
8        C 2022-01-10   150   200   240

If need also groupby per months and Category use:

cols = ['Val1','Val2','Val3']
df[cols] = df.groupby([pd.Grouper(key='Date', freq='M'), 'Category'])[cols].cumsum()
#alternative
#df[cols] = df.groupby([df['Date'].dt.to_period('m'), 'Category'])[cols].cumsum()
print (df)

  Category       Date  Val1  Val2  Val3
0        A 2022-01-01    50    70    80
1        B 2022-01-01    30    50    20
2        C 2022-01-01    40    60    90
3        A 2022-01-05    60   110   100
4        B 2022-01-05    70   125   100
5        C 2022-01-05   100   105   155
6        A 2022-01-10   110   150   135
7        B 2022-01-10   120   175   125
8        C 2022-01-10   150   200   240

CodePudding user response:

You can use groupby.cumsum (on Category only, it doesn't make sense to group by date as each group would be only one value so nothing to cumsum) and update the DataFrame:

df.update(df.groupby(['Category']).cumsum())

output

  Category        Date  Val1  Val2  Val3
0        A  2022-01-01    50    70    80
1        B  2022-01-01    30    50    20
2        C  2022-01-01    40    60    90
3        A  2022-01-05    60   110   100
4        B  2022-01-05    70   125   100
5        C  2022-01-05   100   105   155
6        A  2022-01-10   110   150   135
7        B  2022-01-10   120   175   125
8        C  2022-01-10   150   200   240
  • Related