Pandas group by multiple custom aggregate function on multiple columns-CodePudding

Given data:

grp	data1	data2	data3
a	2	1	2
a	4	6	3
b	3	2	1
b	7	3	5

Expected output:

grp	sum(data1)	sum(data2)/sum(data1)	sum(data3)/sum(data1)
a	6	1.166666667	0.83
a	10	0.5	0.6

Assume custom aggregation can be dependent on multiple columns and not always a simple division operation. I know using SQL query it's possible, but I am interested in an answer with apply and aggregate function if possible.

CodePudding user response：

You could do:

df2 = df.groupby('grp').sum()
df2.update(df2.drop(columns='data1').div(df2['data1'], axis=0))

or more classically:

df2 = df.groupby('grp').sum()
df2[['data2', 'data3']] = df2[['data2', 'data3']].div(df2['data1'], axis=0)

output:

     data1     data2     data3
grp                           
a        6  1.166667  0.833333
b       10  0.500000  0.600000

multiple aggregation

You can use the same logic with a MultiIndex:

df2 = df.groupby('grp').agg(['sum', 'mean'])
df2.update(df2.drop(columns='data1').stack(0).div(df2['data1'], axis=0).unstack())

output:

    data1      data2      data3     
      sum mean   sum mean   sum mean
grp                                 
a       6  3.0     7  3.5     5  2.5
b      10  5.0     5  2.5     6  3.0

CodePudding user response：

You can use groupby assign here to generate required aggregations. You can apply whatever aggregate function is needed.

g = df.groupby('grp')
#                                         for custom agg func use .agg(custom_agg_func)
#                                                          ^^^^^
g[['data1']].agg('sum').assign(sum2 = lambda df: g['data2'].sum()/df['data1'],
                               sum3 = lambda df: g['data3'].sum()/df['data1'])
#                ^^^^^^
#    you can use custom agg func of your choice

     data1      sum2      sum3
grp                           
a        6  1.166667  0.833333
b       10  0.500000  0.600000