I tried looking for the answer for my question but can not find exactly what I'm looking for. I have revenue for a customer and product by day(each customer does not have rev per product each day) and am looking to know the 10 days revenue prior and post of that activity date for that customer, product
Current Input
customer | product | activity_date | revenue |
---|---|---|---|
43 | product a | 2022-01-04 | 45.24 |
89 | product b | 2021-06-17 | 9294.21 |
35 | product a | 2020-05-20 | 75.12 |
Expected Output
customer | product | activity_date | revenue | rev_10_d_prior | rev_10_d_post |
---|---|---|---|---|---|
43 | product a | 2022-01-04 | 45.24 | ||
89 | product b | 2021-06-17 | 9294.21 | ||
35 | product a | 2020-05-20 | 75.12 |
CodePudding user response:
Try this code to add 2 new columns to your dataframe
dataframe.insert(4,'rev_10_d_prior','')
dataframe.insert(5,'rev_10_d_post','')
CodePudding user response:
If df
is your dataframe and the column activity_date
is a datetime
column (sorted) then you could try to use .rolling()
:
df["rev_10_d_prior"] = (
df.groupby(["customer", "product"], as_index=False)
.rolling("11D", on="activity_date", closed="left").sum()
.revenue
)
df["rev_10_d_post"] = (
df[::-1].groupby(["customer", "product"], as_index=False)
.rolling("11D", on="activity_date", closed="left").sum()[::-1]
.revenue
)