Home > OS >  Pandas Group By, Aggregate, Then Return A Different Column
Pandas Group By, Aggregate, Then Return A Different Column

Time:02-11

I have a pandas DataFrame containing baseball fielding statistics. Each row shows how many games a player has appeared at a given position over the course of his career.

Player Position Games
Brock Holt 2B 20
Brock Holt 3B 70
Ben Zobrist OF 100
Ben Zobrist 2B 15

I want to be able to return one row per player containing the position they played the most over the years. In the example above the resulting DataFrame would be:

Player Position
Brock Holt 3B
Ben Zobrist OF

I've been attempting to group by Player and aggregate by Games like so

df.groupby('Player').agg({'Games':['max']})

which simply provides the name of the player and the maximum number of games played at a position and

df.groupby('Player')['Position'].agg({'Games':['max']})

which returns the error "nested renamer is not supported."

CodePudding user response:

You could transform max and use eq to create a boolean mask where True corresponds to the most played Games row for each player. Then filter df:

msk = df.groupby('Player')['Games'].transform('max').eq(df['Games'])
out = df[msk].drop(columns='Games')

Output:

        Player Position
1   Brock Holt       3B
2  Ben Zobrist       OF

CodePudding user response:

You can also just use groupby and max directly as below:

df.groupby(['Player']).max()  

Yields:

           Position Games
Player      
Ben Zobrist OF      100
Brock Holt  3B      70
  • Related