Home > Net >  Adding a calculated metric in multiindex pandas dataframe
Adding a calculated metric in multiindex pandas dataframe

Time:11-06

I have a df:

date            category            subcategory         order_id        product_id
2021-05-04      A                   aa                  10              5
2021-06-04      A                   dd                  10              2
2021-05-06      B                   aa                  18              3
2021-07-06      A                   aa                  50              10
2021-07-06      C                   cc                  10              15
2021-07-05      A                   ff                  101             30
2021-10-04      D                   aa                  100             15

I am trying to calculate weekly items per order metric which is :

count(product_id) / count((unique(order_id)) # in pseudocode

I tried:

a = a.set_index('date').groupby(['category','subcategory'])['order_id', 'product_id']
                       .resample('W-MON', label = 'left')
                       .agg({'order_id':pd.Series.nunique, 'product_id':pd.Series.count}).unstack()
a.fillna(0, inplace = True)

Which returns the data in a way that its repeated 2 times:

                                order_id                                                product_id
                                order_id                    product_id                  order_id                    product_id
                date            2021-05-04  ... ...         2021-05-04  ... ...         2021-05-04  ... ...         2021-05-04  ... ...         
category        subcategory     
A               aa              1           ... ...         2           ... ...         1           ... ...         2           ... ...
A               dd              2           ... ...         1           ... ...         2           ... ...         1           ... ...

I am trying to create this kind of structure:

                                order_id        product_id          product_per_order           order_id        product_id          product_per_order                                 
                date            2021-05-04                                                      2021-05-11                      
category        subcategory     
A               aa              1               2                   2/1                         2               3                   3/2
A               dd              2               1                   1/2                         0               0                   0 # & to avoid division by 0
...             ...  

CodePudding user response:

Remove list after groupby, then add new column with division by DataFrame.assign and last reshape with unstack - if necessary sorting pre datetimes:

a = (a.set_index('date')
     .groupby(['category','subcategory'])
     .resample('W-MON', label = 'left')
     .agg({'order_id':pd.Series.nunique, 'product_id':pd.Series.count})
     .assign(product_per_order = lambda x: x['product_id'].div(x['order_id']))
     .unstack()
     .fillna(0)
     .sort_index(level=1, axis=1, sort_remaining=False))
  • Related