Home > Blockchain >  Get index that has max column value on a multi index groupped by dataframe
Get index that has max column value on a multi index groupped by dataframe

Time:11-12

I'm working with pandas on a dataset and I want to get the attribute that performed more victory in the matches. I am able to created a dataframe with the groupby function. For example for the attribute "surface" that could have 3 alternative I have this dataframe:

My dataframe

Now I would like to have a output dataframe like:

fullname              best_surface
Zuzana Zlochova       Hard
Zuzanna Bednarz       Clay
....

I managed to solve this with some merge for attributes that can have just two values, but it doesn't work with attribute that can have 3 or more values. The dataset is big so I have to work with pandas operations, I cannot use iters.

Thank you

CodePudding user response:

Use DataFrameGroupBy.idxmax for indices by first maximal by column hasWon, select rows and convert MultiIndex to DataFrame by MultiIndex.to_frame:

df = df.loc[df.groupby(level='fullname')['hasWon'].idxmax()].index.to_frame(index=False)
print (df)
              fullname surface
0      Zuzana Zlochova    Hard
1      Zuzanna Bednarz    Clay
2  Zuzanna Szczepanska    Clay
3   Zvonimir Oreskovic    Hard

Or convert tuples to DataFrame in constructor:

df = pd.DataFrame(df.groupby('fullname')['hasWon'].idxmax().tolist(), 
                  columns=['fullname','best_surface'])
print (df)
              fullname best_surface
0      Zuzana Zlochova         Hard
1      Zuzanna Bednarz         Clay
2  Zuzanna Szczepanska         Clay
3   Zvonimir Oreskovic         Hard
  • Related