A pandas dataframe df_sales is populated as shown below. For each ID, there are multiple dates, all grouped & sorted in the order shown prior to the problem.
ID Date Sales
a 01/01/2020 20
a 02/01/2020 65
a 03/01/2020 31
b 01/01/2020 25
b 02/01/2020 15
b 03/01/2020 42
c 01/01/2020 31
c 02/01/2020 50
c 03/01/2020 25
Now, I would like to see sum of last 2 days sales sales in the same dataframe as shown below.
ID Date Sales Sales in last 2 days
a 01/01/2020 20 20
a 02/01/2020 65 85
a 03/01/2020 31 96
b 01/01/2020 25 25
b 02/01/2020 15 40
b 03/01/2020 42 57
c 01/01/2020 31 31
c 02/01/2020 50 81
c 03/01/2020 25 75
Could you please suggest how could this be possible? Many thanks in advance.
I tried grouping and aggregating however I couldn't iterate on the last 2 days alone, especially when there was no previous date value.
CodePudding user response:
Hope I understand correctly, tried using Shift to get previous row value to sum and create a new column named 'Sales in last 2 days'
import pandas as pd
id_col = ['a','a','a','b','b','b','c','c','c']
date_rng = ['2020-01-01','2020-01-02','2020-01-03','2020-01-01','2020-01-02','2020-01-03',\
'2020-01-01','2020-01-02','2020-01-03']
sales = [20,65,31,25,15,29,23,5,65]
df = pd.DataFrame({'ID':id_col, 'Date':date_rng, 'Sales':sales})
id_grp = df.groupby(['ID'])
df_list = []
for df_grp in id_grp:
df_grp[1]['Sales in last 2 days'] = df_grp[1]['Sales'] df_grp[1]['Sales'].shift().fillna(0)
df_list.append(df_grp[1])
print(pd.concat(df_list))