Home > Net >  Add weighted average column with multiple column inputs
Add weighted average column with multiple column inputs

Time:08-08

I have this dataframe:

final_df = [['Name', 'Gender', 'Amount', 'amount2', 'Amount3', 'Percent total of gender'], ['Mike', 'Male', 50, nan, 0, 0.20833333333333334], ['Nancy', 'Female', 30, nan, 0, 0.42857142857142855], ['Bob', 'Male', 100, nan, 0, 0.4166666666666667], ['Terrance', 'Male', 30, nan, 0, 0.125], ['Sara', 'Female', 40, nan, 0, 0.5714285714285714], ['Myo', 'Male', 60, nan, 0, 0.25]]

Name      Gender    Rate    Hours    Amount3 
Mike      Male      20      30       3,000.00 
Nancy     Female    10      50       1,500.00 
Bob       Male      30      40       6,000.00 
Terrance  Male      40      60       12,000.00 
Sara      Female    35      32       3,360.00 
Myo       Male      15      80       6,000.00 

I have this code for the simple average:

final_df['Weighted Average'] = final_df.groupby('Gender')['Amount3'].transform(lambda x: x/x.sum() if x.sum() > 0 else 0 )

I'm trying to add a weighted average column that will take (Rate * Hours) * (Amount3/groupby.sum())

My desired output would be:

enter image description here

Any ideas?

CodePudding user response:

Given your numbers, it looks like the expected computation is:

df['Weighted Average'] = (
df['Amount3']/(df['Rate']*df['Hours'])
*df.groupby('Gender')['Amount3'].transform(lambda x: x/x.sum() if x.sum() > 0 else 0 )
)

which, somehow is quite weird as this is equivalent to something proportional to the square of Amount3:

df['Weighted Average'] = (
    df['Amount3']**2/(df['Rate']*df['Hours'])
    /df.groupby('Gender')['Amount3'].transform('sum').fillna(0)
)

output:

       Name  Gender  Rate  Hours  Amount3  Weighted Average
0      Mike    Male    20     30   3000.0          0.555556
1     Nancy  Female    10     50   1500.0          0.925926
2       Bob    Male    30     40   6000.0          1.111111
3  Terrance    Male    40     60  12000.0          2.222222
4      Sara  Female    35     32   3360.0          2.074074
5       Myo    Male    15     80   6000.0          1.111111

CodePudding user response:

If I understand correctly you need something like:

df['Weighted Average'] = (df.Rate * df.Hours) * (df.Amount3 / df.Gender.map(df.groupby('Gender').Amount3.sum()))
  • Related