I have a DataFrame df
and I create gb = df.groupby("column1")
. Now I would like to do the following:
x = gb.apply(lambda x: x["column2"].sum() / df["column2"].sum())
It works but I would like to based everytinh on x
not x
and df
. Ideally I expected that there is a function x.get_source_df
and then my solution would be:
x = gb.apply(lambda x: x["column2"].sum() / x.get_source_df()["column2"].sum())
and in that case I could save this lambda function in a dictionary which I could use for any df
. Is it possible?
CodePudding user response:
you should not use apply here, may be you find it interesting, optimal method would be
df.groupby('column1')['column2'].sum().div(df['column2'].sum())
It works for more than one column too.
CodePudding user response:
I am not sure in your explanation that you want to divide for the sum
of each group or divide for the sum
of the entire
database. I assume what you want is to divide the sum of each group.
Data:
df = pd.DataFrame({'name':['a']*5 ['b']*5,
'year':[2001,2002,2003,2004,2005]*2,
'val1':[1,2,3,4,5,None,7,8,9,10],
'val2':[21,22,23,24,25,26,27,28,29,30]})
Using transform
then simply divide col by col:
df['sum'] = df.groupby('name')['val1'].transform(lambda g: g.sum())
df['weight'] = df['val1']/df['sum']