Here is my input DataFrame
df = pd.DataFrame([
['CA', 'LAX', 'Alice'],
['NY', 'NYC', 'Alice'],
['TX', 'HOU', 'Alice'],
['CA', 'SFO', 'Bob'],
['NY', 'NYC', 'Bob'],
['TX', 'AUS', 'Bob'],
['CA', 'LAX', 'Carol'],
['NY', 'ALB', 'Carol'],
['TX', 'HOU', 'Carol'],
['CA', 'LAX', 'Dan'],
['NY', 'NYC', 'Dan'],
['TX', 'DAL', 'Dan'],
['CA', 'SFO', 'Eve'],
['NY', 'NYC', 'Eve'],
['TX', 'AUS', 'Eve'],
['CA', 'SDO', 'Frank'],
['NY', 'NYC', 'Frank'],
['TX', 'HOU', 'Frank']],
columns=['State','City','Person'])
And here is the output I wish to get:
State City Count
CA LAX 3
SFO 2
NY ALB 1
NYC 5
TX HOU 3
AUS 2
Basically, I want to show the top 2 cities having the highest count of people, for each state.
I have tried using this groupby statement:
df.groupby(['State','City'])['Person'].count()
which generates below output
State City
CA LAX 3
SDO 1
SFO 2
NY ALB 1
NYC 5
TX AUS 2
DAL 1
HOU 3
But I am unable to come up with statements which could sort the above counts and pick the highest two counts. Would really appreciate the help!
CodePudding user response:
I think you can simplify the approach.
You can use groupby.value_counts()
, instead of count()
, to get where you are, and then chain another groupby().head(2)
to get the top 2 Cities per each state.
The reason is that value_counts()
resulting object will be in descending
order, so that the first element is the most frequently-occurring element, so you can use head(2)
as it is.
df.groupby('State')['City'].value_counts().groupby('State').head(2)
State City
CA LAX 3
SFO 2
NY NYC 5
ALB 1
TX HOU 3
AUS 2
Name: City, dtype: int64
CodePudding user response:
Your first groupby was correct, after that you want to sort your values based on State and Count.
Then you group again solely on the state and fetch the head(2). If you want, you can (re)set your index to State and City.
df2 = df.groupby(['State','City'])['Person'].count().to_frame('Count')
df2 = df2.reset_index().sort_values(['State', 'Count'], ascending=[True, False])
df2.groupby('State').head(2).set_index(['State','City'])
Output:
Count
State City
CA LAX 3
SFO 2
NY NYC 5
ALB 1
TX HOU 3
AUS 2