I have a df
:
date category subcategory order_id product_id
2021-05-04 A aa 10 5
2021-06-04 A dd 10 2
2021-05-06 B aa 18 3
2021-07-06 A aa 50 10
2021-07-06 C cc 10 15
2021-07-05 A ff 101 30
2021-10-04 D aa 100 15
I am trying to calculate weekly items per order
metric which is :
count(product_id) / count((unique(order_id))
# in pseudocode
I tried:
a = a.set_index('date').groupby(['category','subcategory'])['order_id', 'product_id']
.resample('W-MON', label = 'left')
.agg({'order_id':pd.Series.nunique, 'product_id':pd.Series.count}).unstack()
a.fillna(0, inplace = True)
Which returns the data in a way that its repeated 2 times:
order_id product_id
order_id product_id order_id product_id
date 2021-05-04 ... ... 2021-05-04 ... ... 2021-05-04 ... ... 2021-05-04 ... ...
category subcategory
A aa 1 ... ... 2 ... ... 1 ... ... 2 ... ...
A dd 2 ... ... 1 ... ... 2 ... ... 1 ... ...
I am trying to create this kind of structure:
order_id product_id product_per_order order_id product_id product_per_order
date 2021-05-04 2021-05-11
category subcategory
A aa 1 2 2/1 2 3 3/2
A dd 2 1 1/2 0 0 0 # & to avoid division by 0
... ...
CodePudding user response:
Remove list after groupby
, then add new column with division by DataFrame.assign
and last reshape with unstack
- if necessary sorting pre datetimes:
a = (a.set_index('date')
.groupby(['category','subcategory'])
.resample('W-MON', label = 'left')
.agg({'order_id':pd.Series.nunique, 'product_id':pd.Series.count})
.assign(product_per_order = lambda x: x['product_id'].div(x['order_id']))
.unstack()
.fillna(0)
.sort_index(level=1, axis=1, sort_remaining=False))