Home > Software design >  Groupby aggregate multiple columns with same function
Groupby aggregate multiple columns with same function

Time:12-22

I am trying to do groupbyof this table. I have multiple value columns some of them I want to sum & some I want to count.

df_p = pd.DataFrame({'H1':['A1','A1','C1','B1','A1','C1'],
                     'H2':['X1','Y1','Z1','X1','Y1','Z1'],
                     'V1':[1,2,3,4,5,6],
                     'V2':[10,20,30,40,50,60],
                     'V3':[100,200,300,400,500,600],
                     'V4':[1000,2000,3000,4000,5000,6000],
                     'V5':[11,12,13,14,15,16],
                     'V6':[110,120,130,140,150,160],
                     'V7':[1100,1200,1300,1400,1500,1600],
                     'V8':[1100,1200,1300,1400,1500,1600],})

I am trying to achieve that like this, which ofcourse is not a correct syntax.

df_p.groupby(['H1','H2']).agg({['V1','V2','V3']:'sum',['V4','V5','V6','V7','V8']:'count'})

I can do it like this which is too much editing & I have ~30 columns to do that way.

df_p.groupby(['H1','H2']).agg({'V1':'sum','V2':'sum','V3':'sum','V4':'sum','V5':'count','V6':'count','V7':'count','V8':'count'})

I just feel like through dictionary, list, tuple something I should be able to simplify the operation.

CodePudding user response:

Use a dictionary comprehension:

d = {'sum': ['V1','V2','V3'],'count': ['V4','V5','V6','V7','V8']}

df_p.groupby(['H1','H2']).agg({k: v for v,l in d.items() for k in l})

Or:

d = {('V1','V2','V3'):'sum', ('V4','V5','V6','V7','V8'): 'count'}

df_p.groupby(['H1','H2']).agg({k: v for t, v in d.items() for k in t})

Output:

       V1  V2   V3  V4  V5  V6  V7  V8
H1 H2                                 
A1 X1   1  10  100   1   1   1   1   1
   Y1   7  70  700   2   2   2   2   2
B1 X1   4  40  400   1   1   1   1   1
C1 Z1   9  90  900   2   2   2   2   2
  • Related