I am trying to calculate a running total per customer for the previous 365 days using pandas but my code isn't working. My intended output would be something like this:
date | customer | daily_total_per_customer | rolling_total |
---|---|---|---|
2016-07-29 | 1 | 100 | 100 |
2016-08-01 | 1 | 50 | 150 |
2017-01-12 | 1 | 80 | 230 |
2017-10-23 | 1 | 180 | 260 |
2018-03-03 | 1 | 0 | 180 |
2018-03-06 | 1 | 40 | 220 |
2019-03-16 | 1 | 500 | 500 |
2017-04-07 | 2 | 50 | 50 |
2017-04-09 | 2 | 230 | 280 |
2018-02-11 | 2 | 80 | 360 |
2018-05-12 | 2 | 0 | 80 |
2019-05-10 | 2 | 0 | 0 |
I tried the following:
df_3 = df_3.set_index(['customer', 'date']).sort_values(by='date')
rolling_sum = df_3.rolling('365d', on='date')["daily_total_per_customer"].sum()
df_3["rolling_total"] = rolling_sum
And I get the following error
ValueError: invalid on specified as date, must be a column (of DataFrame), an Index or None To recreate the code:
dates = ['2016-07-29',
'2016-08-01',
'2017-01-12',
'2017-10-23',
'2018-03-03',
'2018-03-06',
'2019-03-16',
'2017-04-07',
'2017-04-09',
'2018-02-11',
'2018-05-12',
'2019-05-10',
]
customer = [1,1,1,1,1,1,1,2,2,2,2,2]
daily_total = [100,50,80,180,0,40,500,50,230,80,0,0]
df = pd.DataFrame({'date': dates,
'customer': customer,
'daily_total_per_customer':daily_total,})
Perhaps someone can point me in the right direction. Thanks!
CodePudding user response:
Annotated code
# Parse the strings to datetime
df['date'] = pd.to_datetime(df['date'])
# Sort the dates in ASC order if not already sorted
df = df.sort_values(['customer', 'date'])
# Group the dataframe by customer then for each group
# calculate rolling sum on 'daily_total_per_customer'
s = df.groupby('customer').rolling('365d', on='date')['daily_total_per_customer'].sum()
# Merge the result with original df
df.merge(s.reset_index(name='rolling_total'))
date customer daily_total_per_customer rolling_total
0 2016-07-29 1 100 100
1 2016-08-01 1 50 150
2 2017-01-12 1 80 230
3 2017-10-23 1 180 260
4 2018-03-03 1 0 180
5 2018-03-06 1 40 220
6 2019-03-16 1 500 500
7 2017-04-07 2 50 50
8 2017-04-09 2 230 280
9 2018-02-11 2 80 360
10 2018-05-12 2 0 80
11 2019-05-10 2 0 0