I have a df like the one below:
d = {'date':['2021-02-25','2021-02-25','2021-02-25','2021-02-26','2021-02-26','2021-02-26'],
'label':['A','B','J','A','B','J']
,'count' : [5,1,4,6,3,2]}
dff = pd.DataFrame(d)
print(dff)
date label count
0 2021-02-25 A 5
1 2021-02-25 B 1
2 2021-02-25 J 4
3 2021-02-26 A 6
4 2021-02-26 B 3
5 2021-02-26 J 2
What I want to do is groupby date and label and get the label that had the max count for each day. I do the following and I get this result:
dff.groupby(['date'])['count'].max()
date
2021-02-25 5
2021-02-26 6
However, what I want to do is also include the label that had the max count on the result. Essentially, I want something like this below:
date label
2021-02-25 A 5
2021-02-26 A 6
CodePudding user response:
Get the groupby and filter on it :
dff.loc[dff['count'] == dff.groupby('date')['count'].transform('max')]
Out[436]:
date label count
0 2021-02-25 A 5
3 2021-02-26 A 6
Alternatively, you can sort the count
column before grouping, and keep the max rows:
dff.sort_values('count').groupby('date', as_index = False).nth(-1)
Out[444]:
date label count
0 2021-02-25 A 5
3 2021-02-26 A 6