suppose I have following dataframe:
data = {'id':[1,1,2,2,2,3,3,3],
'var':[10,12,8,11,13,14,15,16],
'ave':[2,2,1,4,3,5,6,8]}
df = pd.DataFrame(data)
I am trying to have the operation, con = var*((ave)/sum(ave))
, based on each id
and then assign the result to my existed dataframe.
by the code below I have tried to define my operation but still do not know what is the problem.
df =df["id"].map( df.groupby(['id']).
apply(lambda x: x[var]*(x[ave])/x[ave].sum())
my expected output would be like this:
id var ave con
1 1 10 2 5
2 1 12 2 6
3 2 8 1 1
4 2 11 4 5.5
5 2 13 3 4.88
6 3 14 5 3.68
7 3 15 6 4.74
8 3 16 8 6.74
thank you in advance.
CodePudding user response:
Don't use apply
, use a vectorial expression with groupby.transform('sum')
:
df['con'] = df['var'].mul(df['ave'].div(df.groupby('id')['ave'].transform('sum')))
# or
# df['con'] = df['var']*df['ave']/df.groupby('id')['ave'].transform('sum')
Output:
id var ave con
0 1 10 2 5.000000
1 1 12 2 6.000000
2 2 8 1 1.000000
3 2 11 4 5.500000
4 2 13 3 4.875000
5 3 14 5 3.684211
6 3 15 6 4.736842
7 3 16 8 6.736842