Due a dataframe like:
COUNTRY_CODE | PDP_SOURCE | TREAT_SLIT | ATC_METRIC |
---|---|---|---|
AR | Aisles | ex_1 | 12.78 |
AR | Aisles | ex_2 | 11.28 |
AR | Aisles | ex_3 | 11.96 |
AR | Favorites | ex_1 | 12.78 |
AR | Favorites | ex_2 | 12.28 |
AR | Favorites | ex_3 | 13.96 |
BR | Favorites | ex_1 | 11.2 |
BR | Favorites | ex_2 | 10.28 |
BR | Favorites | ex_3 | 10.96 |
I need to groupby this by COUNTRY_CODE,PDP_SOURCE and find the TREATMENT_SPLIT which has the max ATC_METRIC value. Is there a way to do this with .groupby in pandas? the result should be something like:
COUNTRY_CODE | PDP_SOURCE | TREAT_SLIT | ATC_METRIC |
---|---|---|---|
AR | Aisles | ex_1 | 12.78 |
AR | Favorites | ex_3 | 13.96 |
BR | Favorites | ex_1 | 11.2 |
CodePudding user response:
here is one way to do it
# use groupby to get the index where ATC_METRIC is max, then
# use loc to return those rows
df.loc[df.groupby(['COUNTRY_CODE','PDP_SOURCE'] )['ATC_METRIC'] .idxmax()]
Result
COUNTRY_CODE PDP_SOURCE TREAT_SLIT ATC_METRIC
0 AR Aisles ex_1 12.78
5 AR Favorites ex_3 13.96
6 BR Favorites ex_1 11.20