Home > OS >  Finding the max value in accordance with other columns
Finding the max value in accordance with other columns

Time:05-29

I have students names, scores in different subjects, subjects names. I want to add a column to the data frame, which contains the subject in which each student had the highest score. Here is the data:

Input data would be:

enter image description here

Output data (the result data frame) would be :

enter image description here

My try at this (didn't work obviously):

Data['Subject with highest score'] = Data.groupby(['Names','Subject'])[['Scores']].transform(lambda x: x.max())

CodePudding user response:

Sort the values by Scores, then group the dataframe by Names and transform the column Subject with last

df['S(max)'] = df.sort_values('Scores').groupby('Names')['Subject'].transform('last')

Alternatively, we can group the dataframe by Names then transform Scores with idxmax to broadcast the indices corresponding to row having max Score, then use those indices to get the corresponding rows from Subject column

df['S(max)'] = df.loc[df.groupby('Names')['Scores'].transform('idxmax'), 'Subject'].tolist()

      Names  Scores    Subject     S(max)
0       Dan      98       Math       Math
1       Dan      88    English       Math
2       Dan      90    Biology       Math
3       Bob      80       Math  Chemistry
4       Bob      93  Chemistry  Chemistry
5       Bob      70     Sports  Chemistry
6       Bob      85     French  Chemistry
7   Michael     100    History    History
8    Sandra      67     French     French
9   Michael      89       Math    History
10  Michael      74     Sports    History
11    Jacky      65    Biology    Physics
12    Jacky     100    Physics    Physics
13    Jacky      90   Geometry    Physics
14    Jacky      87  Geography    Physics
15    Jacky      69       Math    Physics
16      Dan      73     Sports       Math
17   Sandra      50    History     French
  • Related