Home > Back-end >  Trailing cumulative values using Python
Trailing cumulative values using Python

Time:10-27

I would like to perform a specific calculation across columns: pwr - base final to create a 'used' column. However, this has to be grouped by the id and date. Once the first calculation is established, the second calculation will be a cumulative of the newly created used column. Which will be: used column final for each date row.

Data

pwr pos id  date    base    position    final   
100 40  aa  q121    50     20           15  
100 40  aa  q221    50     20           25  
100 40  aa  q321    50     20           10  
100 40  aa  q421    50     20           5   
50  30  bb  q121    25     10           0   
50  30  bb  q221    25     10           10  
                            

Desired

pwr pos id  date    base    position    final   used
100 40  aa  q121    50      20          15      65
100 40  aa  q221    50      20          25      90
100 40  aa  q321    50      20          10      100
100 40  aa  q421    50      20          5       105
50  30  bb  q121    25      10          0       25
50  30  bb  q221    25      10          10      35

Doing

df['used'] =  df['pwr'].sub(df.groupby('id')['date'].cumsum()).sub(df.groupby('id')['base'].cumsum()).add(df.groupby('id')['final'].cumsum())

However, the final column isnt calculating correctly. Any suggestion is appreciated.

CodePudding user response:

We need to first find the first item , make all repeated value to NaN , then go to groupby with cumsum

s = df['pwr'].sub(df['base']).mask(df['id'].duplicated()).add(df['final'],fill_value=0)

df['new'] = s.groupby(df['id']).cumsum()
df
   pwr  pos  id  date  base  position  final    new
0  100   40  aa  q121    50        20     15   65.0
1  100   40  aa  q221    50        20     25   90.0
2  100   40  aa  q321    50        20     10  100.0
3  100   40  aa  q421    50        20      5  105.0
4   50   30  bb  q121    25        10      0   25.0
5   50   30  bb  q221    25        10     10   35.0
  • Related