Leave a column what I wanted after 'groupby' and calculation-CodePudding

If I have a table of test results, and want to know the person who is the best and who is the worst for each subject.

The example of the input table:

Subject	Score	Name
Math	27	Student1
History	43	Student2
Math	44	Student3
History	50	Student1
Science	7	Student1
History	10	Student3
Science	43	Student2

I want to make output table like following table:

Subject	BestScr	BestStud	WorstScr	WorstStud
Math	44	Student3	27	Student1
History	50	Student1	10	Student3
Science	43	Student2	7	Student1

Following code is what I tried:

inputTableGrouped = inputTable.groupby(['Subject'])

outputTableGrouped['BestScr'] = inputTableGrouped.Score.max()
outputTableGrouped['WorstScr'] = inputTableGrouped.Score.min()

outputTable = outputTableGrouped.reset_index()

But this way, I cannot find the way to record student name. What should I try?

CodePudding user response：

You can use GroupBy.agg with DataFrameGroupBy.idxmax and DataFrameGroupBy.idxmin for indices by maximal and minimal Score and select mtched rows with DataFrame.loc, convert Subject to indices with DataFrame.add_prefix and join together:

df1 = df.groupby('Subject', sort=False)['Score'].agg(['idxmin','idxmax'])

df11 = df.loc[df1['idxmax']].set_index('Subject').add_prefix('Best')
df22 = df.loc[df1['idxmin']].set_index('Subject').add_prefix('Worst')

df = df11.join(df22).reset_index()
print (df)
   Subject  BestScore  BestName  WorstScore WorstName
0     Math         44  Student3          27  Student1
1  History         50  Student1          10  Student3
2  Science         43  Student2           7  Student1

Or convert Name to index, so you can use GroupBy.agg with DataFrameGroupBy.idxmax and DataFrameGroupBy.idxmin and min with max functions:

df = (df.set_index('Name')
        .groupby('Subject', as_index=False, sort=False)
        .agg(BestScr=('Score', 'max'),
            BestStud= ('Score', 'idxmax'),
            WorstScr=('Score', 'min'),
            WorstStud=('Score', 'idxmax'))
)
print (df)
   Subject  BestScr  BestStud  WorstScr WorstStud
0     Math       44  Student3        27  Student3
1  History       50  Student1        10  Student1
2  Science       43  Student2         7  Student2

CodePudding user response：

You can use a custom groupby.agg:

(df.groupby('Subject', as_index=False, sort=False)
    .agg(**{'BestScr': ('Score', 'max'),
            'BestStd': ('Score', lambda s: df.loc[s.idxmax(), 'Name']),
            'WorstScr': ('Score', 'min'),
            'WorstStd': ('Score', lambda s: df.loc[s.idxmin(), 'Name']),
                            })
)

Output:

   Subject  BestScr   BestStd  WorstScr  WorstStd
0     Math       44  Student3        27  Student1
1  History       50  Student1        10  Student3
2  Science       43  Student2         7  Student1