I have a transaction data frame as follows.
import pandas as pd
pd.DataFrame({"id":[139,139,139,432,474,164],
"order_date":["2022-01-01","2022-02-01","2022-02-04","2022-02-03","2022-02- 28","2022-01-24"],
"product_id":["A","B","B","E","A","D"],
"sales($)":[10,20,30,20,10,20]})
Given a date_slicer_list=[30,60]
, I want to slice the data as shown below from the max(order_date).
I'm looking for a function which takes date_slicer_list as an argument so that I can give another list if I want more slices of data.
pd.DataFrame({"customer_id":[139,432,474,164],
"last_30days_sales":[50,20,10,0],
"last_30_to_60_days_sales":[10,0,0,10]})
CodePudding user response:
You can use a custom cut
and unstack
:
def f(df, lst):
date = pd.to_datetime(df['order_date'])
MAX = date.max()
bins = pd.to_timedelta([0] lst, unit='D')
labels = lst[:1] ['_'.join(map(str, x)) for x in zip(lst, lst[1:])]
group = pd.cut(pd.to_datetime(df['order_date']),
bins=(MAX-bins)[::-1],
labels=labels[::-1]
)
return (df.groupby(['id', group], sort=False)['sales($)'].sum()
.unstack(1)
.add_prefix('last_').add_suffix('_days_sales')
.reset_index()
.rename_axis(columns=None)
)
f(df, date_slicer_list)
output:
id last_30_60_days_sales last_30_days_sales
0 139 10 50
1 164 20 0
2 432 0 20
3 474 0 10
CodePudding user response:
bins=[0,30,60]
df['order_date'] = pd.to_datetime(df['order_date'])
df['days'] = (df['order_date'].max() - df['order_date']).dt.days
df2=(df.assign(grp=pd.cut(df['days'], bins=bins, include_lowest=True))
.pivot_table(index='id',
columns='grp',
values='sales($)' ,
aggfunc='sum')
.reset_index()
)
df2.columns=['last_' str(col) '_days_of_sales' if str(col)!='id' else col for col in df2.columns ]
df2
id last_(-0.001, 30.0]_days_of_sales last_(30.0, 60.0]_days_of_sales
0 139 50 10
1 164 0 20
2 432 20 0
3 474 10 0