I am trying to do groupby
of this table. I have multiple value columns some of them I want to sum & some I want to count.
df_p = pd.DataFrame({'H1':['A1','A1','C1','B1','A1','C1'],
'H2':['X1','Y1','Z1','X1','Y1','Z1'],
'V1':[1,2,3,4,5,6],
'V2':[10,20,30,40,50,60],
'V3':[100,200,300,400,500,600],
'V4':[1000,2000,3000,4000,5000,6000],
'V5':[11,12,13,14,15,16],
'V6':[110,120,130,140,150,160],
'V7':[1100,1200,1300,1400,1500,1600],
'V8':[1100,1200,1300,1400,1500,1600],})
I am trying to achieve that like this, which ofcourse is not a correct syntax.
df_p.groupby(['H1','H2']).agg({['V1','V2','V3']:'sum',['V4','V5','V6','V7','V8']:'count'})
I can do it like this which is too much editing & I have ~30 columns to do that way.
df_p.groupby(['H1','H2']).agg({'V1':'sum','V2':'sum','V3':'sum','V4':'sum','V5':'count','V6':'count','V7':'count','V8':'count'})
I just feel like through dictionary, list, tuple something I should be able to simplify the operation.
CodePudding user response:
Use a dictionary comprehension:
d = {'sum': ['V1','V2','V3'],'count': ['V4','V5','V6','V7','V8']}
df_p.groupby(['H1','H2']).agg({k: v for v,l in d.items() for k in l})
Or:
d = {('V1','V2','V3'):'sum', ('V4','V5','V6','V7','V8'): 'count'}
df_p.groupby(['H1','H2']).agg({k: v for t, v in d.items() for k in t})
Output:
V1 V2 V3 V4 V5 V6 V7 V8
H1 H2
A1 X1 1 10 100 1 1 1 1 1
Y1 7 70 700 2 2 2 2 2
B1 X1 4 40 400 1 1 1 1 1
C1 Z1 9 90 900 2 2 2 2 2