Sorting pandas groupby output-CodePudding

I have a dataframe that looks like

name   performance   year
bob      50          2002
bob      90          2005
bob      82          2010
joey     50          2015
joey     85          2013
joey     37          1990
sarah    90          1994
sarah    95          2020
sarah    35          2013

I would like groupby name and compute average performance while only displaying the top two results in descending order by performance.

I am currently doing df.groupby(['name']).mean() but this computes the averages of both performance as well as year while displaying all 3 names in alphabetical order (I would only like to display the top 2 in descending order by performance avg).

CodePudding user response：

here is my solution, basically was missing one field in the group by method.

Code:

import pandas as pd

# defining columns
cols = ['name', 'performance', 'year']

# defining data
data = [
    ['bob',   50, 2002]
,   ['bob',   90, 2005]
,   ['bob',   82, 2010]
,   ['joey',  50, 2015]
,   ['joey',  85, 2013]
,   ['joey',  37, 1990]
,   ['sarah', 90, 1994]
,   ['sarah', 95, 2020]
,   ['sarah', 35, 2013]
]

# create dataframe
df = pd.DataFrame(data, columns=cols)

# dataframe, grouped by name and year, aggregated by mean() of performance, first 2 values in descending order
df = df.groupby(['name', 'year'])['performance'].mean().sort_values(ascending=False).head(2)

# resetting index to display performance column name
df = df.reset_index()

# print dataframe
print(df)

Output:

    name  year  performance
0  sarah  2020         95.0
1    bob  2005         90.0