I have a DataFrame like so:
time_start pred time_end_floor xyz
0 2022-05-06 12:00:00 26 NaT NaN
1 2022-05-06 13:00:00 16 NaT NaN
2 2022-05-06 14:00:00 10 2022-05-06 14:00:00 4.0
3 2022-05-06 15:00:00 8 2022-05-06 15:00:00 9.0
4 2022-05-06 16:00:00 11 2022-05-06 16:00:00 15.0
I need to calculate a running total but whilst also subtracting values based on when a value in time_end_floor
occurs.
EG)
The running total goes like 26 16 10 until 14:00:00 and then I need to subtract the value from column xyz
4. So the value I want in another column at 14:00:00 is 48 as it is (26 16 10) - 4. So 48 goes into the other column at 14:00:00.
Then the next row adds on 8 to 48 to equal 56 but at the same time subtracts 9 so in the other column at 15:00:00 it should show 47 and so on.
At the end of the DataFrame I should be back to 0 as the last entry. Ideally want this as quick as possible as I know it could be done using apply()
but could cause severe overhead.
Any help really appreciated! :)
CodePudding user response:
The exact expected output is unclear (do you need a shift?), but you can use:
df['sum'] = df['pred'].cumsum().sub(df['xyz'].fillna(0, downcast='infer').cumsum())
unless you want 26 16 10 for all the NaN values? Here as sum2
s = df['pred'].cumsum()
df['sum2'] = s.sub(df['xyz'].cumsum()).fillna(s.mask(df['xyz'].isna()).bfill(), downcast='infer')
output:
time_start pred time_end_floor xyz sum sum2
0 2022-05-06 12:00:00 26 NaT NaN 26 52
1 2022-05-06 13:00:00 16 NaT NaN 42 52
2 2022-05-06 14:00:00 10 2022-05-06 14:00:00 4.0 48 48
3 2022-05-06 15:00:00 8 2022-05-06 15:00:00 9.0 47 47
4 2022-05-06 16:00:00 11 2022-05-06 16:00:00 15.0 43 43