Home > Back-end >  Calculating multiple weighted averages based on multiple weight values - Pandas
Calculating multiple weighted averages based on multiple weight values - Pandas

Time:10-15

I'm pretty beginner at Python and I understand how np.average works in general but I can't figure out how to get it to work for my specific application. I'm looking to create different weighted averages for a row of a df, based on different weights.

Example below:

df=pd.DataFrame({'vals':[50,99,12,33],
    'weight1':np.random.randint(0,50,4),
    'weight2':np.random.randint(0,50,4),
    'weight3':np.random.randint(0,50,4)})

   vals  weight1  weight2  weight3
0    50       39       11        9
1    99       37       17       27
2    12       22       29        0
3    33       39       17       47

I'm looking to create a column (in a separate dataframe) that has the weighted average of 'vals' for each set of weights that I'm using.

So the output would look something like this:

   weights  weightedvals
0  weight1         52.29
1  weight2         42.45
2  weight3         56.31

I understand how to get these weighted averages individually doing something like

average(df['vals'], weights = df['weight1'])

But I'm getting stuck at how to do this for multiple weight values. I've tried a few solutions but they're more for using the same weight for multiple columns.

Thank you!!

CodePudding user response:

You might be looking for a generator object, something like this:

[np.average(df['vals'], weights=df[w]) for w in df.columns[1:]]

will generate a list of elements where the first element corresponds to the average using 'weight1' the second to 'weight2' and so on. You can read it as a compressed for-loop, even though its quite a bit faster than using a for-loop and appending values to a list. df.columns is just a list of the column names, so df.columns[1:] is a list of column names omitting the first element.

So to get the output you're looking for just

avg = [np.average(df['vals'], weights=df[w]) for w in df.columns[1:]]
avg_df = pd.DataFrame({'weights' : df.columns[1:], 'weightedvals' : avg})
  • Related