I'm trying to create a running sum on a Dataframe that follows some conditions, sum all previous values with same 'first index current day' value and condiction is true, the pseudocode would be something like
df_ = df[first index current day:current index] #Delimiter dataframe range from begin of day until current row
df_ = df_[df_['toSum'] == True] #drop False
result = sum(df_['sales']) #Sum all sales until now with 'toSum' == True
This is very easy using a loop or even a apply but take too much time, my dataframe have more than 1000k rows, the best I could do until now is use threads but if it's possible to vectorize that algorithm it would be a lot fastter
The result is something like this:
Actual Id | First ID Actual Day | Sales | toSum | AcumulativeDay |
---|---|---|---|---|
0 | 0 | 1 | True | 1 |
1 | 0 | 1 | True | 2 |
2 | 0 | 1 | False | 2 |
3 | 0 | 1 | False | 2 |
4 | 0 | 1 | True | 3 |
5 | 5 | 1 | True | 1 |
6 | 5 | 1 | True | 2 |
7 | 5 | 1 | False | 2 |
8 | 5 | 1 | False | 2 |
9 | 5 | 1 | True | 3 |
CodePudding user response:
After having the table filtered as you did use:
df['cumsum'] = df['sales'].cumsum()
You may want to have the dataframe sorted by date.
CodePudding user response:
Use groupby
over First ID Actual Day
to get all previous values with same first day ID. Then, multiply the boolean column toSum
with Sales
so that cumsum
can interpret the False
values as zeros. Remove the indices with droplevel
and make the attribution to the variable.
import pandas as pd
d= {'Actual Id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9},
'First ID Actual Day': {0: 0,1: 0,2: 0,3: 0,4: 0,5: 5,6: 5,7: 5,8: 5,9: 5},
'Sales': {0: 1.0,1: 11.33,2: 1.0,3: 31.2,4: 1.0,5: 241.14,6: 1.0,7: 1.0,8: 1.0,9: 54.22},
'toSum': {0: True,1: True,2: False,3: False,4: True,5: True,6: True,7: False,8: False,9: True}
}
df = pd.DataFrame(d)
df['AcumulativeDay'] = df.groupby(['First ID Actual Day'])\
.apply(lambda x: (x['Sales']*x['toSum'])\
.cumsum()).droplevel(0)
Actual Id First ID Actual Day Sales toSum AcumulativeDay
0 0 0 1.00 True 1.00
1 1 0 11.33 True 12.33
2 2 0 1.00 False 12.33
3 3 0 31.20 False 12.33
4 4 0 1.00 True 13.33
5 5 5 241.14 True 241.14
6 6 5 1.00 True 242.14
7 7 5 1.00 False 242.14
8 8 5 1.00 False 242.14
9 9 5 54.22 True 296.36