I'm trying to find the highest salary within each decade for every given name. Below is my sample dataset and the line of code
| Bins | Name | Salary |
|-------|------|--------|
| 1990s | Ron | 8000 |
| 1990s | Ron | 8500 |
| 2000s | Hary | 7000 |
| 1980s | Ron | 6800 |
| 2010s | John | 10000 |
| 2010s | John | 21000 |
df.sort_values(by='Salary', ascending=False).groupby('Bins').reset_index()
But I keep getting error: "need to rewrap column in Series to apply key function"
My desired output should look like:
| Bins | Name | Salary |
|-------|------|--------|
| 1980s | Ron | 6800 |
| 1990s | Ron | 8500 |
| 2000s | Hary | 7000 |
| 2010s | John | 21000 |
CodePudding user response:
This should work:
df.groupby('Bins').max().reset_index()
CodePudding user response:
You can group by ['Bins', 'Name']
and use GroupBy.first()
to get the highest value within the group (since you have sorted the values in descending order, the first entry within the group is the one with the highest value):
df.sort_values(by='Salary', ascending=False).groupby(['Bins', 'Name']).first().reset_index()
Result:
Bins Name Salary
0 1980s Ron 6800
1 1990s Ron 8500
2 2000s Hary 7000
3 2010s John 21000
Another way to do it is to group by ['Bins', 'Name']
and use GroupBy.max()
df.groupby(['Bins', 'Name'], as_index=False)['Salary'].max()
Same result.