Here is my table:
date id action value
2021-09-02 aa income 500
2021-09-02 aa spending 500
2021-09-02 aa spending 45
2021-09-03 aa income 30
2021-09-03 aa income 30
2021-09-03 aa spending 25
2021-09-04 b1 income 100
2021-09-05 b1 income 500
2021-09-05 b1 spending 500
2021-09-05 b1 spending 45
2021-09-06 b1 income 30
2021-09-06 b1 income 30
2021-09-07 b1 spending 25
As you see there is two types of actions "income" and "spending". What I want is to add column with accumulation of "value" at each moment for each of my id. And after each action "income" it must increase by value of that "income" and when there is "spending" it must decrease by value of that decrease. So result must look like this:
date id action value saved
2021-09-02 aa income 500 0
2021-09-02 aa spending 400 500
2021-09-02 aa spending 40 100
2021-09-03 aa income 30 60
2021-09-03 aa income 30 90
2021-09-03 aa spending 25 120
2021-09-04 b1 income 100 0
2021-09-05 b1 income 500 100
2021-09-05 b1 spending 500 600
2021-09-05 b1 spending 45 100
2021-09-06 b1 income 30 55
2021-09-06 b1 income 30 85
2021-09-07 b1 spending 25 115
How to do that?
CodePudding user response:
You need to first negate value
based on action
, then groupby id
and finally, since you want to start at value 0, shift the rows by one on the groups. Finally get the cumsum:
df.value.where(df.action=='income', -df.value).groupby(df.id).shift().groupby(df.id).cumsum()
>>> df = pd.DataFrame({'date': ('2021-09-02', '2021-09-02', '2021-09-02', '2021-09-03', '2021-09-03', '2021-09-03', '2021-09-04', '2021-09-05', '2021-09-05', '2021-09-05', '2021-09-06', '2021-09-06', '2021-09-07'), 'id': ('aa', 'aa', 'aa', 'aa', 'aa', 'aa', 'b1', 'b1', 'b1', 'b1', 'b1', 'b1', 'b1'), 'action': ('income', 'spending', 'spending', 'income', 'income', 'spending', 'income', 'income', 'spending', 'spending', 'income', 'income', 'spending'), 'value': map(int, ('500', '500', '45', '30', '30', '25', '100', '500', '500', '45', '30', '30', '25'))})
>>> df
date id action value
0 2021-09-02 aa income 500
1 2021-09-02 aa spending 400
2 2021-09-02 aa spending 40
3 2021-09-03 aa income 30
4 2021-09-03 aa income 30
5 2021-09-03 aa spending 25
6 2021-09-04 b1 income 100
7 2021-09-05 b1 income 500
8 2021-09-05 b1 spending 500
9 2021-09-05 b1 spending 45
10 2021-09-06 b1 income 30
11 2021-09-06 b1 income 30
12 2021-09-07 b1 spending 25
>>> df.value.where(df.action=='income', -df.value).groupby(df.id).shift().groupby(df.id).cumsum()
>>> df
date id action value saved
0 2021-09-02 aa income 500 0.0
1 2021-09-02 aa spending 400 500.0
2 2021-09-02 aa spending 40 100.0
3 2021-09-03 aa income 30 60.0
4 2021-09-03 aa income 30 90.0
5 2021-09-03 aa spending 25 120.0
6 2021-09-04 b1 income 100 0.0
7 2021-09-05 b1 income 500 100.0
8 2021-09-05 b1 spending 500 600.0
9 2021-09-05 b1 spending 45 100.0
10 2021-09-06 b1 income 30 55.0
11 2021-09-06 b1 income 30 85.0
12 2021-09-07 b1 spending 25 115.0