I am ranking each instance of a group by. I want to return only the rows where the largest "rank" occurs. In this example the only rows I want to return is the where "rank" is the largest for each individual State grouping.
import pandas as pd
import numpy as np
data = {'Product':['Box','Bottles','Pen','Markers','Bottles','Pen','Markers','Bottles','Box','Markers','Markers','Pen'],
'State':['Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'],
'Sales':[14,24,31,12,13,7,9,31,18,16,18,14]}
df1=pd.DataFrame(data, columns=['Product','State','Sales'])
df1
df1['Rank'] = df1.groupby(['State'])['Sales'].cumcount().add(1)
CodePudding user response:
Use:
In [1001]: df1[df1['Rank'].eq(df1.groupby('State')['Rank'].transform('max'))]
Out[1001]:
Product State Sales Rank
8 Box North Carolina 18 2
9 Markers Alaska 16 3
10 Markers California 18 3
11 Pen Texas 14 4
CodePudding user response:
Not exactly sure what the desired output should be like, but following your requirements the below should work. I till give you only the max rank on a per State/ per Product basis
>>> df1.groupby(['State','Product'], as_index=False).max()