I am trying to sum a value column based on 2 categorical columns values.
dict1 = {'A': {0: 'A0', 1: 'A0', 2: 'A0', 3: 'A0', 4: 'A1', 5: 'A1', 6: 'A1', 7: 'A1', 8: 'A1', 9: 'A1'}, 'B': {0: 'B0', 1: 'B1', 2: 'B2', 3: 'B3', 4: 'B4', 5: 'B5', 6: 'B6', 7: 'B7', 8: 'B8', 9: 'B9'}, 'C': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}, 'D': {0: 10, 1: 11, 2: 12, 3: 13, 4: 14, 5: 15, 6: 16, 7: 17, 8: 18, 9: 19}, 'E': {0: 'E0', 1: 'E1', 2: 'E0', 3: 'E1', 4: 'E0', 5: 'E1', 6: 'E0', 7: 'E1', 8: 'E0', 9: 'E1'}}
df2 = pd.DataFrame(dict1)
I am now trying to get a cumsum
of column D based on column A & E, i.e. A0E0, A0E1, A1E0 & so on should have a cumulative sum like this in column F.
A B C D E F
0 A0 B0 0 10 E0 10.0
1 A0 B1 1 11 E1 11.0
2 A0 B2 2 12 E0 22.0
3 A0 B3 3 13 E1 24.0
4 A1 B4 4 14 E0 14.0
5 A1 B5 5 15 E1 15.0
6 A1 B6 6 16 E0 30.0
7 A1 B7 7 17 E1 32.0
8 A1 B8 8 18 E0 48.0
9 A1 B9 9 19 E1 51.0
CodePudding user response:
df2['F'] = df2.groupby(['A', 'E'])['D'].cumsum()
df2
A B C D E F
0 A0 B0 0 10 E0 10
1 A0 B1 1 11 E1 11
2 A0 B2 2 12 E0 22
3 A0 B3 3 13 E1 24
4 A1 B4 4 14 E0 14
5 A1 B5 5 15 E1 15
6 A1 B6 6 16 E0 30
7 A1 B7 7 17 E1 32
8 A1 B8 8 18 E0 48
9 A1 B9 9 19 E1 51