I have a dataframe that looks like
name performance year
bob 50 2002
bob 90 2005
bob 82 2010
joey 50 2015
joey 85 2013
joey 37 1990
sarah 90 1994
sarah 95 2020
sarah 35 2013
I would like groupby name
and compute average performance
while only displaying the top two results in descending order by performance.
I am currently doing df.groupby(['name']).mean()
but this computes the averages of both performance
as well as year
while displaying all 3 names in alphabetical order (I would only like to display the top 2 in descending order by performance avg).
CodePudding user response:
here is my solution, basically was missing one field in the group by method.
Code:
import pandas as pd
# defining columns
cols = ['name', 'performance', 'year']
# defining data
data = [
['bob', 50, 2002]
, ['bob', 90, 2005]
, ['bob', 82, 2010]
, ['joey', 50, 2015]
, ['joey', 85, 2013]
, ['joey', 37, 1990]
, ['sarah', 90, 1994]
, ['sarah', 95, 2020]
, ['sarah', 35, 2013]
]
# create dataframe
df = pd.DataFrame(data, columns=cols)
# dataframe, grouped by name and year, aggregated by mean() of performance, first 2 values in descending order
df = df.groupby(['name', 'year'])['performance'].mean().sort_values(ascending=False).head(2)
# resetting index to display performance column name
df = df.reset_index()
# print dataframe
print(df)
Output:
name year performance
0 sarah 2020 95.0
1 bob 2005 90.0