I'm pretty beginner at Python and I understand how np.average works in general but I can't figure out how to get it to work for my specific application. I'm looking to create different weighted averages for a row of a df, based on different weights.
Example below:
df=pd.DataFrame({'vals':[50,99,12,33],
'weight1':np.random.randint(0,50,4),
'weight2':np.random.randint(0,50,4),
'weight3':np.random.randint(0,50,4)})
vals weight1 weight2 weight3
0 50 39 11 9
1 99 37 17 27
2 12 22 29 0
3 33 39 17 47
I'm looking to create a column (in a separate dataframe) that has the weighted average of 'vals' for each set of weights that I'm using.
So the output would look something like this:
weights weightedvals
0 weight1 52.29
1 weight2 42.45
2 weight3 56.31
I understand how to get these weighted averages individually doing something like
average(df['vals'], weights = df['weight1'])
But I'm getting stuck at how to do this for multiple weight values. I've tried a few solutions but they're more for using the same weight for multiple columns.
Thank you!!
CodePudding user response:
You might be looking for a generator object, something like this:
[np.average(df['vals'], weights=df[w]) for w in df.columns[1:]]
will generate a list of elements where the first element corresponds to the average using 'weight1'
the second to 'weight2'
and so on. You can read it as a compressed for-loop, even though its quite a bit faster than using a for-loop and appending values to a list. df.columns
is just a list of the column names, so df.columns[1:]
is a list of column names omitting the first element.
So to get the output you're looking for just
avg = [np.average(df['vals'], weights=df[w]) for w in df.columns[1:]]
avg_df = pd.DataFrame({'weights' : df.columns[1:], 'weightedvals' : avg})