I am trying to create an equation that does Revenue - Cost and them cumulates the sums by group. Is there a way to use group by with an equation? I have provided some sample data below.
Data:
stack = pd.DataFrame({'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'Group': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
'Revenue': [2000, 3000, 2000, 1000, 4000, 3000, 1500, 2200, 8000, 4500, 5600, 7000],
'Cost': [500, 400, 300, 700, 155, 355, 990, 1000, 800, 250, 450, 900]},
columns=['ID', 'Group', 'Revenue', 'Cost'])
stack['Profit'] = stack['Revenue'] - stack['Cost']
I tried doing this for cumsum by group but it didn't work.
stack['Total Profit'] = stack['Profit'].cumsum().groupby(['Group'])
Do I need to use lambda?
CodePudding user response:
just rearrange the statement as follows
stack['Total Profit']=stack.groupby(['Group'])['Profit'].cumsum()
stack
ID Group Revenue Cost Profit Total Profit
0 1 A 2000 500 1500 1500
1 2 A 3000 400 2600 4100
2 3 A 2000 300 1700 5800
3 4 A 1000 700 300 6100
4 5 B 4000 155 3845 3845
5 6 B 3000 355 2645 6490
6 7 B 1500 990 510 7000
7 8 B 2200 1000 1200 8200
8 9 C 8000 800 7200 7200
9 10 C 4500 250 4250 11450
10 11 C 5600 450 5150 16600
11 12 C 7000 900 6100 22700
CodePudding user response:
Rearranging how these methods are called will do the trick:
stack['Total Profit'] = stack.groupby('Group')['Profit'].cumsum()
now stack
will have this:
ID Group Revenue Cost Profit Total Profit
0 1 A 2000 500 1500 1500
1 2 A 3000 400 2600 4100
2 3 A 2000 300 1700 5800
3 4 A 1000 700 300 6100
4 5 B 4000 155 3845 3845
5 6 B 3000 355 2645 6490
6 7 B 1500 990 510 7000
7 8 B 2200 1000 1200 8200
8 9 C 8000 800 7200 7200
9 10 C 4500 250 4250 11450
10 11 C 5600 450 5150 16600
11 12 C 7000 900 6100 22700