Home > Mobile >  Applying Groupby and aggregation on a set of dynamically selected columns
Applying Groupby and aggregation on a set of dynamically selected columns

Time:11-03

I have a dataframe and need to group the data based on some columns.

Static way:

dfMMS_BBMS_pv.columns
['area_type', 'area_name', 'area_code', 'date', 'A_BBMS',
   'A_MMS', 'Others_BBMS', 'Others_MMS', 'B_BBMS',
   'C_BBMS', 'C_MMS', 'T_BBMS', 'V_BBMS',
   'D_BBMS', 'D_MMS']

dfMMS_BBMS_pv=dfMMS_BBMS_pv.groupby(['area_type', 'area_name', 'area_code']).\
agg({'date': lambda x: list((x)) ,'A_MMS': lambda x: 
list(round(x,2)) })

Now my question is that how I can make this aggregation dynamic based on a set of columns' names which have a specific pattern:

Interested_Cols=dfMMS_BBMS_pv.filter(regex='BBMS|MMS').columns

dfMMS_BBMS_pv=dfMMS_BBMS_pv.groupby(['area_type', 'area_name', 'area_code']).\
agg({'date': lambda x: list((x)) ,[i: lambda x : list(round(x,2)) for i in       
list(Interested_Cols)]

To clarify, the date column should be present.

Error I get:

enter image description here

CodePudding user response:

Create final dictionary before aggregation by merge both dicts and pass to GroupBy.agg:

d = {**{'date': lambda x: list(x)},
     **{i: lambda x: list(round(x,2)) for i in Interested_Cols}}

dfMMS_BBMS_pv=dfMMS_BBMS_pv.groupby(['area_type', 'area_name', 'area_code']).agg(d)
  • Related