Home > Mobile >  Generating a separate column that stores weighted average per group
Generating a separate column that stores weighted average per group

Time:09-16

It must not be that hard but I can't cope with this problem.

Imagine I have a long format dataframe with some data and want to calculate a weighted average of score per person and weighted by a manager and keep it as a separate variable - 'w_mean_m'.

df['w_mean_m'] = df.groupby('person')['score'].transform(lambda x: np.average(x['score'], weights=x['manager_weight']))

throws an error and I have no idea how to fix it.

CodePudding user response:

Because GroupBy.transform working with each column separately is not possible select multiple columns, so is used GroupBy.apply with Series.map for new column:

s = (df.groupby('contact')
       .apply(lambda x: np.average(x['score'], weights=x['manager_weight'])))
df['w_mean_m'] = df['contact'].map(s)

One hack is possible with selected values by unique index for weights:

df = df.reset_index(drop=True)

f = lambda x: np.average(x, weights=df.loc[x.index, "manager_weight"])
df['w_mean_m1'] = df.groupby('contact')['score'].transform(f)


print (df)
    manager_weight  score contact  w_mean_m1
0              1.0      1       a   1.282609
1              1.1      1       a   1.282609
2              1.2      1       a   1.282609
3              1.3      2       a   1.282609
4              1.4      2       b   2.355556
5              1.5      2       b   2.355556
6              1.6      3       b   2.355556
7              1.7      3       c   3.770270
8              1.8      4       c   3.770270
9              1.9      4       c   3.770270
10             2.0      4       c   3.770270

Setup:

df = pd.DataFrame(
    {
        "manager_weight": [1.0,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0],
        "score": [1,1,1,2,2,2,3,3,4,4,4],
        "contact": ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c']
    })
  • Related