Home > Mobile >  Leave a column what I wanted after 'groupby' and calculation
Leave a column what I wanted after 'groupby' and calculation

Time:12-15

If I have a table of test results, and want to know the person who is the best and who is the worst for each subject.

The example of the input table:

Subject Score Name
Math 27 Student1
History 43 Student2
Math 44 Student3
History 50 Student1
Science 7 Student1
History 10 Student3
Science 43 Student2

I want to make output table like following table:

Subject BestScr BestStud WorstScr WorstStud
Math 44 Student3 27 Student1
History 50 Student1 10 Student3
Science 43 Student2 7 Student1

Following code is what I tried:

inputTableGrouped = inputTable.groupby(['Subject'])

outputTableGrouped['BestScr'] = inputTableGrouped.Score.max()
outputTableGrouped['WorstScr'] = inputTableGrouped.Score.min()

outputTable = outputTableGrouped.reset_index()

But this way, I cannot find the way to record student name. What should I try?

CodePudding user response:

You can use GroupBy.agg with DataFrameGroupBy.idxmax and DataFrameGroupBy.idxmin for indices by maximal and minimal Score and select mtched rows with DataFrame.loc, convert Subject to indices with DataFrame.add_prefix and join together:

df1 = df.groupby('Subject', sort=False)['Score'].agg(['idxmin','idxmax'])

df11 = df.loc[df1['idxmax']].set_index('Subject').add_prefix('Best')
df22 = df.loc[df1['idxmin']].set_index('Subject').add_prefix('Worst')

df = df11.join(df22).reset_index()
print (df)
   Subject  BestScore  BestName  WorstScore WorstName
0     Math         44  Student3          27  Student1
1  History         50  Student1          10  Student3
2  Science         43  Student2           7  Student1

Or convert Name to index, so you can use GroupBy.agg with DataFrameGroupBy.idxmax and DataFrameGroupBy.idxmin and min with max functions:

df = (df.set_index('Name')
        .groupby('Subject', as_index=False, sort=False)
        .agg(BestScr=('Score', 'max'),
            BestStud= ('Score', 'idxmax'),
            WorstScr=('Score', 'min'),
            WorstStud=('Score', 'idxmax'))
)
print (df)
   Subject  BestScr  BestStud  WorstScr WorstStud
0     Math       44  Student3        27  Student3
1  History       50  Student1        10  Student1
2  Science       43  Student2         7  Student2

CodePudding user response:

You can use a custom groupby.agg:

(df.groupby('Subject', as_index=False, sort=False)
    .agg(**{'BestScr': ('Score', 'max'),
            'BestStd': ('Score', lambda s: df.loc[s.idxmax(), 'Name']),
            'WorstScr': ('Score', 'min'),
            'WorstStd': ('Score', lambda s: df.loc[s.idxmin(), 'Name']),
                            })
)

Output:

   Subject  BestScr   BestStd  WorstScr  WorstStd
0     Math       44  Student3        27  Student1
1  History       50  Student1        10  Student3
2  Science       43  Student2         7  Student1
  • Related