Home > Mobile >  group by two columns and get the max of a third
group by two columns and get the max of a third

Time:10-17

I have a df like the one below:

d = {'date':['2021-02-25','2021-02-25','2021-02-25','2021-02-26','2021-02-26','2021-02-26'],
    'label':['A','B','J','A','B','J']
    ,'count' : [5,1,4,6,3,2]}
dff = pd.DataFrame(d)
print(dff)
        date    label   count
0   2021-02-25   A       5
1   2021-02-25   B       1
2   2021-02-25   J       4
3   2021-02-26   A       6
4   2021-02-26   B       3
5   2021-02-26   J       2

What I want to do is groupby date and label and get the label that had the max count for each day. I do the following and I get this result:

dff.groupby(['date'])['count'].max()
   date
2021-02-25    5
2021-02-26    6

However, what I want to do is also include the label that had the max count on the result. Essentially, I want something like this below:

   date      label
2021-02-25     A      5
2021-02-26     A      6

 

CodePudding user response:

Get the groupby and filter on it :

dff.loc[dff['count'] == dff.groupby('date')['count'].transform('max')]
Out[436]: 
         date label  count
0  2021-02-25     A      5
3  2021-02-26     A      6

Alternatively, you can sort the count column before grouping, and keep the max rows:

dff.sort_values('count').groupby('date', as_index = False).nth(-1)
Out[444]: 
         date label  count
0  2021-02-25     A      5
3  2021-02-26     A      6
  • Related