Given data:
grp | data1 | data2 | data3 |
---|---|---|---|
a | 2 | 1 | 2 |
a | 4 | 6 | 3 |
b | 3 | 2 | 1 |
b | 7 | 3 | 5 |
Expected output:
grp | sum(data1) | sum(data2)/sum(data1) | sum(data3)/sum(data1) |
---|---|---|---|
a | 6 | 1.166666667 | 0.83 |
a | 10 | 0.5 | 0.6 |
Assume custom aggregation can be dependent on multiple columns and not always a simple division operation. I know using SQL query it's possible, but I am interested in an answer with apply and aggregate function if possible.
CodePudding user response:
You could do:
df2 = df.groupby('grp').sum()
df2.update(df2.drop(columns='data1').div(df2['data1'], axis=0))
or more classically:
df2 = df.groupby('grp').sum()
df2[['data2', 'data3']] = df2[['data2', 'data3']].div(df2['data1'], axis=0)
output:
data1 data2 data3
grp
a 6 1.166667 0.833333
b 10 0.500000 0.600000
multiple aggregation
You can use the same logic with a MultiIndex:
df2 = df.groupby('grp').agg(['sum', 'mean'])
df2.update(df2.drop(columns='data1').stack(0).div(df2['data1'], axis=0).unstack())
output:
data1 data2 data3
sum mean sum mean sum mean
grp
a 6 3.0 7 3.5 5 2.5
b 10 5.0 5 2.5 6 3.0
CodePudding user response:
You can use groupby assign here to generate required aggregations. You can apply whatever aggregate function is needed.
g = df.groupby('grp')
# for custom agg func use .agg(custom_agg_func)
# ^^^^^
g[['data1']].agg('sum').assign(sum2 = lambda df: g['data2'].sum()/df['data1'],
sum3 = lambda df: g['data3'].sum()/df['data1'])
# ^^^^^^
# you can use custom agg func of your choice
data1 sum2 sum3
grp
a 6 1.166667 0.833333
b 10 0.500000 0.600000