Find the highest value within multiple groups in python-CodePudding

I'm trying to find the highest salary within each decade for every given name. Below is my sample dataset and the line of code

| Bins  | Name | Salary |
|-------|------|--------|
| 1990s | Ron  | 8000   |
| 1990s | Ron  | 8500   |
| 2000s | Hary | 7000   |
| 1980s | Ron  | 6800   |
| 2010s | John | 10000  |
| 2010s | John | 21000  |

df.sort_values(by='Salary', ascending=False).groupby('Bins').reset_index()

But I keep getting error: "need to rewrap column in Series to apply key function"

My desired output should look like:

| Bins  | Name | Salary |
|-------|------|--------|
| 1980s | Ron  | 6800   |
| 1990s | Ron  | 8500   |
| 2000s | Hary | 7000   |
| 2010s | John | 21000  |

CodePudding user response：

This should work:

df.groupby('Bins').max().reset_index()

CodePudding user response：

You can group by ['Bins', 'Name'] and use GroupBy.first() to get the highest value within the group (since you have sorted the values in descending order, the first entry within the group is the one with the highest value):

df.sort_values(by='Salary', ascending=False).groupby(['Bins', 'Name']).first().reset_index()

Result:

    Bins  Name  Salary
0  1980s   Ron    6800
1  1990s   Ron    8500
2  2000s  Hary    7000
3  2010s  John   21000

Another way to do it is to group by ['Bins', 'Name'] and use GroupBy.max()

df.groupby(['Bins', 'Name'], as_index=False)['Salary'].max()

Same result.