I have a dataframe like this:
Category | Date | Val1 | Val2 | Val3 |
----------|------------|--------|--------|--------|
A | 2022-01-01 | 50 | 70 | 80 |
B | 2022-01-01 | 30 | 50 | 20 |
C | 2022-01-01 | 40 | 60 | 90 |
A | 2022-01-05 | 10 | 40 | 20 |
B | 2022-01-05 | 40 | 75 | 80 |
C | 2022-01-05 | 60 | 45 | 65 |
A | 2022-01-10 | 50 | 40 | 35 |
B | 2022-01-10 | 50 | 50 | 25 |
C | 2022-01-10 | 50 | 95 | 85 |
What I want to achieve is groupby date and class and take the cumulative sum. So expected output would be like this:
Category | Date | Val1 | Val2 | Val3 |
----------|------------|--------|--------|--------|
A | 2022-01-01 | 50 | 70 | 80 |
B | 2022-01-01 | 30 | 50 | 20 |
C | 2022-01-01 | 40 | 60 | 90 |
A | 2022-01-05 | 60 | 110 | 100 |
B | 2022-01-05 | 70 | 125 | 100 |
C | 2022-01-05 | 100 | 105 | 155 |
A | 2022-01-10 | 110 | 150 | 135 |
B | 2022-01-10 | 120 | 175 | 125 |
C | 2022-01-10 | 150 | 200 | 240 |
All numerical columns are cumsum (Val1, Val2 & Val3)
I tried to take reference from this code but couldn't make it work in my case.
CodePudding user response:
If not groupby by date
, only by Category
ouput is:
cols = ['Val1','Val2','Val3']
df[cols] = df.groupby('Category')[cols].cumsum()
print (df)
Category Date Val1 Val2 Val3
0 A 2022-01-01 50 70 80
1 B 2022-01-01 30 50 20
2 C 2022-01-01 40 60 90
3 A 2022-01-05 60 110 100
4 B 2022-01-05 70 125 100
5 C 2022-01-05 100 105 155
6 A 2022-01-10 110 150 135
7 B 2022-01-10 120 175 125
8 C 2022-01-10 150 200 240
If need also groupby per months and Category
use:
cols = ['Val1','Val2','Val3']
df[cols] = df.groupby([pd.Grouper(key='Date', freq='M'), 'Category'])[cols].cumsum()
#alternative
#df[cols] = df.groupby([df['Date'].dt.to_period('m'), 'Category'])[cols].cumsum()
print (df)
Category Date Val1 Val2 Val3
0 A 2022-01-01 50 70 80
1 B 2022-01-01 30 50 20
2 C 2022-01-01 40 60 90
3 A 2022-01-05 60 110 100
4 B 2022-01-05 70 125 100
5 C 2022-01-05 100 105 155
6 A 2022-01-10 110 150 135
7 B 2022-01-10 120 175 125
8 C 2022-01-10 150 200 240
CodePudding user response:
You can use groupby.cumsum
(on Category only, it doesn't make sense to group by date as each group would be only one value so nothing to cumsum) and update
the DataFrame:
df.update(df.groupby(['Category']).cumsum())
output
Category Date Val1 Val2 Val3
0 A 2022-01-01 50 70 80
1 B 2022-01-01 30 50 20
2 C 2022-01-01 40 60 90
3 A 2022-01-05 60 110 100
4 B 2022-01-05 70 125 100
5 C 2022-01-05 100 105 155
6 A 2022-01-10 110 150 135
7 B 2022-01-10 120 175 125
8 C 2022-01-10 150 200 240