I would like to perform a specific calculation across columns: pwr - base final to create a 'used' column. However, this has to be grouped by the id and date. Once the first calculation is established, the second calculation will be a cumulative of the newly created used column. Which will be: used column final for each date row.
Data
pwr pos id date base position final
100 40 aa q121 50 20 15
100 40 aa q221 50 20 25
100 40 aa q321 50 20 10
100 40 aa q421 50 20 5
50 30 bb q121 25 10 0
50 30 bb q221 25 10 10
Desired
pwr pos id date base position final used
100 40 aa q121 50 20 15 65
100 40 aa q221 50 20 25 90
100 40 aa q321 50 20 10 100
100 40 aa q421 50 20 5 105
50 30 bb q121 25 10 0 25
50 30 bb q221 25 10 10 35
Doing
df['used'] = df['pwr'].sub(df.groupby('id')['date'].cumsum()).sub(df.groupby('id')['base'].cumsum()).add(df.groupby('id')['final'].cumsum())
However, the final column isnt calculating correctly. Any suggestion is appreciated.
CodePudding user response:
We need to first find the first item , make all repeated
value to NaN , then go to groupby
with cumsum
s = df['pwr'].sub(df['base']).mask(df['id'].duplicated()).add(df['final'],fill_value=0)
df['new'] = s.groupby(df['id']).cumsum()
df
pwr pos id date base position final new
0 100 40 aa q121 50 20 15 65.0
1 100 40 aa q221 50 20 25 90.0
2 100 40 aa q321 50 20 10 100.0
3 100 40 aa q421 50 20 5 105.0
4 50 30 bb q121 25 10 0 25.0
5 50 30 bb q221 25 10 10 35.0