If I have a table of test results, and want to know the person who is the best and who is the worst for each subject.
The example of the input table:
Subject | Score | Name |
---|---|---|
Math | 27 | Student1 |
History | 43 | Student2 |
Math | 44 | Student3 |
History | 50 | Student1 |
Science | 7 | Student1 |
History | 10 | Student3 |
Science | 43 | Student2 |
I want to make output table like following table:
Subject | BestScr | BestStud | WorstScr | WorstStud |
---|---|---|---|---|
Math | 44 | Student3 | 27 | Student1 |
History | 50 | Student1 | 10 | Student3 |
Science | 43 | Student2 | 7 | Student1 |
Following code is what I tried:
inputTableGrouped = inputTable.groupby(['Subject'])
outputTableGrouped['BestScr'] = inputTableGrouped.Score.max()
outputTableGrouped['WorstScr'] = inputTableGrouped.Score.min()
outputTable = outputTableGrouped.reset_index()
But this way, I cannot find the way to record student name. What should I try?
CodePudding user response:
You can use GroupBy.agg
with DataFrameGroupBy.idxmax
and
DataFrameGroupBy.idxmin
for indices by maximal and minimal Score
and select mtched rows with DataFrame.loc
, convert Subject
to indices with DataFrame.add_prefix
and join together:
df1 = df.groupby('Subject', sort=False)['Score'].agg(['idxmin','idxmax'])
df11 = df.loc[df1['idxmax']].set_index('Subject').add_prefix('Best')
df22 = df.loc[df1['idxmin']].set_index('Subject').add_prefix('Worst')
df = df11.join(df22).reset_index()
print (df)
Subject BestScore BestName WorstScore WorstName
0 Math 44 Student3 27 Student1
1 History 50 Student1 10 Student3
2 Science 43 Student2 7 Student1
Or convert Name
to index, so you can use GroupBy.agg
with DataFrameGroupBy.idxmax
and
DataFrameGroupBy.idxmin
and min
with max
functions:
df = (df.set_index('Name')
.groupby('Subject', as_index=False, sort=False)
.agg(BestScr=('Score', 'max'),
BestStud= ('Score', 'idxmax'),
WorstScr=('Score', 'min'),
WorstStud=('Score', 'idxmax'))
)
print (df)
Subject BestScr BestStud WorstScr WorstStud
0 Math 44 Student3 27 Student3
1 History 50 Student1 10 Student1
2 Science 43 Student2 7 Student2
CodePudding user response:
You can use a custom groupby.agg
:
(df.groupby('Subject', as_index=False, sort=False)
.agg(**{'BestScr': ('Score', 'max'),
'BestStd': ('Score', lambda s: df.loc[s.idxmax(), 'Name']),
'WorstScr': ('Score', 'min'),
'WorstStd': ('Score', lambda s: df.loc[s.idxmin(), 'Name']),
})
)
Output:
Subject BestScr BestStd WorstScr WorstStd
0 Math 44 Student3 27 Student1
1 History 50 Student1 10 Student3
2 Science 43 Student2 7 Student1