Supposed a dataframe
import pandas as pd
df = pd.DataFrame({
'Model': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
'Year': [2019, 2020, 2021, 2018, 2019, 2020, 2021],
'Transmission': ['Manual', 'Automatic', 'Automatic', 'Manual', 'Automatic', 'Automatic', 'Manual'],
'EngineSize': [1.4, 2.0, 1.4, 1.5, 2.0, 1.5, 1.5],
'MPG': [55.4, 67.3, 58.9, 52.3, 64.2, 68.9, 83.1]
})
df
and I want to return the highest MPG per year plus the model. Looked like this
Year MPG Model
2018 52.3 D
2019 64.2 E
2020 68.9 F
2021 83.1 G
I'm thinking by using groupby but still stuck on how to show the Model column.
CodePudding user response:
You could use groupby
idxmax
to get the index of the max MPG of each year; then use loc
to filter:
out = df.loc[df.groupby('Year')['MPG'].idxmax(), ['Year', 'MPG', 'Model']]
Output:
Year MPG Model
3 2018 52.3 D
4 2019 64.2 E
5 2020 68.9 F
6 2021 83.1 G
CodePudding user response:
I like @enke's answer better. But you could use the groupby
apply
with pd.DataFrame.nlargest
df.groupby('Year').apply(pd.DataFrame.nlargest, n=1, columns=['MPG'])
Model Year Transmission EngineSize MPG
Year
2018 3 D 2018 Manual 1.5 52.3
2019 4 E 2019 Automatic 2.0 64.2
2020 5 F 2020 Automatic 1.5 68.9
2021 6 G 2021 Manual 1.5 83.1