I have a problem trying to calculate some final tests marks. I need to group by Students, getting only the highest value in each column for each student.
Being DF the dataframe:
data = {'Students': ['Student1', 'Student1', 'Student1', 'Student2','Student2','Studen3'],
'Result1': [2, 4, 5, 8, 2, 5],
'Result2': [5, 3, 2, 8, 5, 5],
'Result3': [7, 5, 7, 3, 8, 9]}
df = pd.DataFrame(data)
Students Result1 Result2 Result3
0 Student1 2 5 7
1 Student1 4 3 5
2 Student1 5 2 7
3 Student2 8 8 3
4 Student2 2 5 8
5 Studen3 5 5 9
I need to generate a DF choosing the higher mark, for each student, in each Result.
So, the final DF should look like:
Students Result1 Result2 Result3
0 Student1 5 5 7
1 Student2 8 8 8
2 Student3 5 5 9
Any help?
CodePudding user response:
The dataframe can be generated using simply iterations over groups:
df2 = pd.DataFrame(columns=('Student', 'res1', 'res2', 'res3'))
for s in df.Students.unique():
stdf = df[df["Students"]==s]
df2 = df2.append({'Student':s,'res1':max(stdf.Result1),'res2':max(stdf.Result2),
'res3':max(stdf.Result3)}, ignore_index=True)
CodePudding user response:
Works calling groupby('Students').max()
>>> df.groupby('Students').max()
Result1 Result2 Result3
Students
Student1 5 5 7
Student2 8 8 8
Student3 5 5 9