How to return the rows from the largest value from a group by in Pandas?-CodePudding

I am ranking each instance of a group by. I want to return only the rows where the largest "rank" occurs. In this example the only rows I want to return is the where "rank" is the largest for each individual State grouping.

import pandas as pd
import numpy as np
 
data = {'Product':['Box','Bottles','Pen','Markers','Bottles','Pen','Markers','Bottles','Box','Markers','Markers','Pen'], 
        'State':['Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'], 
        'Sales':[14,24,31,12,13,7,9,31,18,16,18,14]}
 
df1=pd.DataFrame(data, columns=['Product','State','Sales']) 
df1

df1['Rank'] = df1.groupby(['State'])['Sales'].cumcount().add(1)

CodePudding user response：

Use:

In [1001]: df1[df1['Rank'].eq(df1.groupby('State')['Rank'].transform('max'))]
Out[1001]: 
    Product           State  Sales  Rank
8       Box  North Carolina     18     2
9   Markers          Alaska     16     3
10  Markers      California     18     3
11      Pen           Texas     14     4

CodePudding user response：

Not exactly sure what the desired output should be like, but following your requirements the below should work. I till give you only the max rank on a per State/ per Product basis

>>> df1.groupby(['State','Product'], as_index=False).max()