I have a dataframe and need to group the data based on some columns.
Static way:
dfMMS_BBMS_pv.columns
['area_type', 'area_name', 'area_code', 'date', 'A_BBMS',
'A_MMS', 'Others_BBMS', 'Others_MMS', 'B_BBMS',
'C_BBMS', 'C_MMS', 'T_BBMS', 'V_BBMS',
'D_BBMS', 'D_MMS']
dfMMS_BBMS_pv=dfMMS_BBMS_pv.groupby(['area_type', 'area_name', 'area_code']).\
agg({'date': lambda x: list((x)) ,'A_MMS': lambda x:
list(round(x,2)) })
Now my question is that how I can make this aggregation dynamic based on a set of columns' names which have a specific pattern:
Interested_Cols=dfMMS_BBMS_pv.filter(regex='BBMS|MMS').columns
dfMMS_BBMS_pv=dfMMS_BBMS_pv.groupby(['area_type', 'area_name', 'area_code']).\
agg({'date': lambda x: list((x)) ,[i: lambda x : list(round(x,2)) for i in
list(Interested_Cols)]
To clarify, the date column should be present.
Error I get:
CodePudding user response:
Create final dictionary before aggregation by merge both dicts and pass to GroupBy.agg
:
d = {**{'date': lambda x: list(x)},
**{i: lambda x: list(round(x,2)) for i in Interested_Cols}}
dfMMS_BBMS_pv=dfMMS_BBMS_pv.groupby(['area_type', 'area_name', 'area_code']).agg(d)