I have a df
that looks like this:
date category subcategory order_id
2021-05-04 A aa 10
2021-06-04 A dd 10
2021-05-06 B aa 18
2021-07-06 A aa 50
2021-07-06 C cc 10
2021-07-05 A ff 101
2021-10-04 D aa 100
I am trying to get the count of unique orders per week, category & subcategory
:
pd.pivot_table(df, index=['date', 'category', 'subcategory'],
values=['order_id'],
aggfunc=lambda x: len(x.unique())).unstack(level=0)
Which do return the result but not grouped by week, I am wondering if it's possible to group the results during the groupby
operation?
I've checked this answer but it looks like it is outdated as I get an error:
TypeError: resample() got an unexpected keyword argument 'how'
CodePudding user response:
You can use groupby
with DataFrameGroupBy.resample
, then ouput is:
f = lambda x: x.nunique()
df = df.set_index('date').groupby(['category','subcategory'])['order_id'].resample('W').agg(f).unstack()
print (df)
date 2021-05-09 2021-05-16 2021-05-23 2021-05-30 \
category subcategory
A aa 1.0 0.0 0.0 0.0
dd NaN NaN NaN NaN
ff NaN NaN NaN NaN
B aa 1.0 NaN NaN NaN
C cc NaN NaN NaN NaN
D aa NaN NaN NaN NaN
date 2021-06-06 2021-06-13 2021-06-20 2021-06-27 \
category subcategory
A aa 0.0 0.0 0.0 0.0
dd 1.0 NaN NaN NaN
ff NaN NaN NaN NaN
B aa NaN NaN NaN NaN
C cc NaN NaN NaN NaN
D aa NaN NaN NaN NaN
date 2021-07-04 2021-07-11 2021-10-10
category subcategory
A aa 0.0 1.0 NaN
dd NaN NaN NaN
ff NaN 1.0 NaN
B aa NaN NaN NaN
C cc NaN 1.0 NaN
D aa NaN NaN 1.0
Or if use Grouper
output is:
f = lambda x: x.nunique()
df = df.groupby(['category','subcategory', pd.Grouper(freq='W', key='date')])['order_id'].agg(f).unstack()
print (df)
date 2021-05-09 2021-07-11 2021-06-06 2021-10-10
category subcategory
A aa 1.0 1.0 NaN NaN
dd NaN NaN 1.0 NaN
ff NaN 1.0 NaN NaN
B aa 1.0 NaN NaN NaN
C cc NaN 1.0 NaN NaN
D aa NaN NaN NaN 1.0