Generating a separate column that stores weighted average per group-CodePudding

It must not be that hard but I can't cope with this problem.

Imagine I have a long format dataframe with some data and want to calculate a weighted average of score per person and weighted by a manager and keep it as a separate variable - 'w_mean_m'.

df['w_mean_m'] = df.groupby('person')['score'].transform(lambda x: np.average(x['score'], weights=x['manager_weight']))

throws an error and I have no idea how to fix it.

CodePudding user response：

Because GroupBy.transform working with each column separately is not possible select multiple columns, so is used GroupBy.apply with Series.map for new column:

s = (df.groupby('contact')
       .apply(lambda x: np.average(x['score'], weights=x['manager_weight'])))
df['w_mean_m'] = df['contact'].map(s)

One hack is possible with selected values by unique index for weights:

df = df.reset_index(drop=True)

f = lambda x: np.average(x, weights=df.loc[x.index, "manager_weight"])
df['w_mean_m1'] = df.groupby('contact')['score'].transform(f)


print (df)
    manager_weight  score contact  w_mean_m1
0              1.0      1       a   1.282609
1              1.1      1       a   1.282609
2              1.2      1       a   1.282609
3              1.3      2       a   1.282609
4              1.4      2       b   2.355556
5              1.5      2       b   2.355556
6              1.6      3       b   2.355556
7              1.7      3       c   3.770270
8              1.8      4       c   3.770270
9              1.9      4       c   3.770270
10             2.0      4       c   3.770270

Setup:

df = pd.DataFrame(
    {
        "manager_weight": [1.0,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0],
        "score": [1,1,1,2,2,2,3,3,4,4,4],
        "contact": ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c']
    })