I have 2 dataframes - One with the values under each group. The second dataframe has weights for each group to get the weighted average.
df1 =
product group1 group2 group3 group4 group5
xyz 223056 907858 4803 586623 835423
abc 557042 382306 915624 786236 413817
pqr 951348 723503 587397 141975 567826
df2 =
group weight
group1 0.5
group2 7.5
group3 10
group4 1.5
group5 50
I'd like to get a weighted average of all groups for each product and append it to df1. For example :
product group1 group2 group3 group4 group5 weighted-average
xyz 223056 907858 4803 586623 835423 40
abc 557042 382306 915624 786236 413817 25
pqr 951348 723503 587397 141975 567826 30
Is there an easy way to do this? I managed to do this in excel, but I would like to try it in pandas as I'm learning it still.
CodePudding user response:
Use np.average
with DataFrame.reindex
for same order columns and index:
s = df2.set_index('group')['weight']
df1['wa'] = np.average(df1.reindex(columns=s.index), weights=s, axis=1)
print (df1)
product group1 group2 group3 group4 group5 wa
0 xyz 223056 907858 4803 586623 835423 713950.755396
1 abc 557042 382306 915624 786236 413817 491687.194245
2 pqr 951348 723503 587397 141975 567826 581009.769784
If always same order is possible use:
df1 = df1.set_index('product')
df1['wa'] = np.average(df1, weights=df2['weight'], axis=1)
print (df1)
group1 group2 group3 group4 group5 wa
product
xyz 223056 907858 4803 586623 835423 713950.755396
abc 557042 382306 915624 786236 413817 491687.194245
pqr 951348 723503 587397 141975 567826 581009.769784
CodePudding user response:
You can mul
tiply by the weight, sum
and div
ide by the sum of weights:
df1['wa'] = (df1.mul(df2.set_index('group')['weight'])
.sum(axis=1).div(df2['weight'].sum())
)
output:
product group1 group2 group3 group4 group5 wa
0 xyz 223056 907858 4803 586623 835423 713950.755396
1 abc 557042 382306 915624 786236 413817 491687.194245
2 pqr 951348 723503 587397 141975 567826 581009.769784