Is there a way to calculate weighted average from 2 different dataframes?-CodePudding

I have 2 dataframes - One with the values under each group. The second dataframe has weights for each group to get the weighted average.

df1 =

product     group1  group2  group3  group4  group5
xyz         223056  907858  4803    586623  835423 
abc         557042  382306  915624  786236  413817 
pqr         951348  723503  587397  141975  567826

df2 =

group    weight
group1   0.5
group2   7.5
group3   10
group4   1.5
group5   50

I'd like to get a weighted average of all groups for each product and append it to df1. For example :

product     group1  group2  group3  group4  group5 weighted-average
xyz         223056  907858  4803    586623  835423   40
abc         557042  382306  915624  786236  413817   25
pqr         951348  723503  587397  141975  567826   30

Is there an easy way to do this? I managed to do this in excel, but I would like to try it in pandas as I'm learning it still.

CodePudding user response：

Use np.average with DataFrame.reindex for same order columns and index:

s = df2.set_index('group')['weight']
df1['wa'] = np.average(df1.reindex(columns=s.index), weights=s, axis=1)

print (df1)
  product  group1  group2  group3  group4  group5             wa
0     xyz  223056  907858    4803  586623  835423  713950.755396
1     abc  557042  382306  915624  786236  413817  491687.194245
2     pqr  951348  723503  587397  141975  567826  581009.769784

If always same order is possible use:

df1 = df1.set_index('product')
df1['wa'] = np.average(df1, weights=df2['weight'], axis=1)

print (df1)
         group1  group2  group3  group4  group5             wa
product                                                       
xyz      223056  907858    4803  586623  835423  713950.755396
abc      557042  382306  915624  786236  413817  491687.194245
pqr      951348  723503  587397  141975  567826  581009.769784

CodePudding user response：

You can multiply by the weight, sum and divide by the sum of weights:

df1['wa'] = (df1.mul(df2.set_index('group')['weight'])
                .sum(axis=1).div(df2['weight'].sum())
             )

output:

  product  group1  group2  group3  group4  group5             wa
0     xyz  223056  907858    4803  586623  835423  713950.755396
1     abc  557042  382306  915624  786236  413817  491687.194245
2     pqr  951348  723503  587397  141975  567826  581009.769784