How to use pandas calculating the weighted data? [duplicate]-CodePudding

This question already has answers here:

For simplicity, I write a simplified example

import pandas as pd
import numpy as np

indices = (1,2,3,4,5,6)
cols = ["id", "region", "weight", "score1", "score2"]
data = (["xxx1", 1, 2, 10, 20], ["xxx2", 2, 5, 6, 8], ["xxx3", 1, 3, 9, 12], ["xxx4", 1, 3, 12, 20], ["xxx5", 1, 5, 5, 30], ["xxx6", 2, 10, 12, 20])
df = df = pd.DataFrame(data, index = indices, columns = cols)
df

which looks like this

I want to calculate the weighted average score using pandas and numpy packages only, and my ideal result should be a 2x3 table, whose rows are regions they belong to and columns are the weighted average score1 and score2. Below is my attempt

def cal(x, w):
    assert len(x) == len(w)
    nlist = len(x)
    sum_weight = np.sum(w)
    weight_total = 0
    for i in range(nlist):
        weight_total = weight_total   x[i] * w[i]
    
    wavg = weight_total / sum_weight
    return wavg

However, I think this method is too awkward. I am wondering if there is more convenient way to achieve the same goal. Any hint or help is welcome

the ideal result

CodePudding user response：

Try calculate weighted total and sum of weight separately and then divide:

weighted_total = df.filter(like='score').mul(df.weight, axis=0).groupby(df.region).sum()
sum_weight = df.weight.groupby(df.region).sum()
weighted_total.div(sum_weight, axis=0)

           score1  score2
region                   
1        8.307692    22.0
2       10.000000    16.0

Or in a single groupby:

df.groupby('region').apply(
  lambda g: g.filter(like='score').mul(g.weight, axis=0).sum().div(g.weight.sum(), axis=0)
)
           score1  score2
region                   
1        8.307692    22.0
2       10.000000    16.0

Page link：https//www.codepudding.com/Mobile/144488.html

Prev:How to convert duplicate keys containing multiple list to a dataframe in Python?

Next:Pivot and create running total from dataset using Python