Home > Enterprise >  How to select best row of a grouped dataframe in Pandas
How to select best row of a grouped dataframe in Pandas

Time:07-26

Lets say I have the following dataframe

df = pd.DataFrame.from_dict({'class':['A', 'A', 'A', 'B','B'],
                            'name': ['max1', 'lisa1', 'max3', 'lisa2', 'lisa3'],
                             'gender': ['m', 'f','m','f','f'],
                            'grade':[2,3,1, 2,4]})
>>> print(df)
  class   name gender  grade
0     A   max1      m      2
1     A  lisa1      f      3
2     A   max3      m      1
3     B  lisa2      f      2
4     B  lisa3      f      4

I want to get the best student(s) of each class.(lower grade is better)

  class  grade   name gender
0     A      1   max3      m
1     B      2  lisa2      f

How would I achieve that with pandas? Also if the best grade is not unique as in the above example how can I select the first occurrence of best/aggregate them to a list?

To clarify assume that lisa3 has a grade 2 instead of 4. Then the result I wish for would be:

  • First occurrence: same as above
  • aggregating into list:
  class  grade            name  gender
0     A      1          [max3]     [m]
1     B      2  [lisa2, lisa3]  [f, f]

CodePudding user response:

You can use pandas.DataFrame.groupby to group the students by each class and for each class you can get the minimum using transform and then convert the rows that occurs more than once to list using pandas.Series.tolist.

df = pd.DataFrame.from_dict({'class':['A', 'A', 'A', 'B','B'],
                        'name': ['max1', 'lisa1', 'max3', 'lisa2', 'lisa3'],
                         'gender': ['m', 'f','m','f','f'],
                        'grade':[2,3,1, 2,2]})

temp = df[df.grade == df.groupby(['class'])['grade'].transform('min')]
temp.groupby(['class', 'grade']).agg(pd.Series.tolist)

Output :

This gives you the expected output :

                       name  gender
class grade                        
A     1              [max3]     [m]
B     2      [lisa2, lisa3]  [f, f]

DataFrameGroupBy.transform will basically give you the minimal value for each group and returns a DataFrame having the same indexes as the original object filled with the transformed values. In our case min for each group is returned corresponding to each index value.

So df.groupby(['class'])['grade'].transform('min') gives us

0    1
1    1
2    1
3    2
4    2

where 1 and 2 are the minimum values of their respective group, replicated for each row belonging to that group.

CodePudding user response:

Try this:

df.groupby(['class','grade']).agg(list).groupby('class').head(1)

Output:

                       name  gender
class grade                        
A     1              [max3]     [m]
B     2      [lisa2, lisa3]  [f, f]

CodePudding user response:

You can groupby apply:

df.groupby("class", as_index=False).apply(lambda x: x[x['grade'].eq(x['grade'].min())].\
                                          groupby(['class', 'grade'], as_index=False).\
                                          agg(list)).reset_index(drop=True)

  class  grade            name  gender
0     A      1          [max3]     [m]
1     B      2  [lisa2, lisa3]  [f, f]
  • Related