Home > Blockchain >  How to create weighted average of dataset per date?
How to create weighted average of dataset per date?

Time:06-07

I have a dataset,df that looks as follows:

Date Code Population QTY
2021-01-01 1000 10000 5000
2021-01-02 1000 10000 5013
2021-01-01 2200 33000 10022
2021-01-02 2200 33000 10002

I would like to take the daily weighted average of my dataframe, where my weight is my Population, and store it in a new dataset, df_avg that contains only Date and WEIGHTED QTY. How can I do that?

CodePudding user response:

There are two ways: 1) groupby.apply with np.average and 2) manually calculate with lazy groupby.

Path 1 is easier to code, but a little slower:

df.groupby('Date')\
  .apply(lambda x: np.average(x['QTY'], weights=x['Population']))\
  .reset_index(name='avg')

Path 2 is faster, but more verbose:

df['prods'] = df['QTY'] * df['Population']

groups = df.groupby('Date')

groups['prods'].sum().div(groups['Population'].sum()).reset_index(name='avg')

Either way gives you the output:

         Date          avg
0  2021-01-01  8854.093023
1  2021-01-02  8841.767442

Note, an alternative to Path 2 without adding new column to data involves groupby twice:

df['QTY'].mul(df['Population']).groupby(df['Date']).sum()\
   .div(df.groupby('Date')['Population'].sum())\
   .reset_index(name='avg')

Another option, involving the pipe method:

(df
.assign(weighted_qty = df.Population * df.QTY)
.groupby('Date')
.pipe(lambda df: df.weighted_qty.sum()/df.Population.sum())
)
Date
2021-01-01    8854.093023
2021-01-02    8841.767442
dtype: float64
  • Related