I'm working with pandas on a dataset and I want to get the attribute that performed more victory in the matches. I am able to created a dataframe with the groupby function. For example for the attribute "surface" that could have 3 alternative I have this dataframe:
Now I would like to have a output dataframe like:
fullname best_surface
Zuzana Zlochova Hard
Zuzanna Bednarz Clay
....
I managed to solve this with some merge for attributes that can have just two values, but it doesn't work with attribute that can have 3 or more values. The dataset is big so I have to work with pandas operations, I cannot use iters.
Thank you
CodePudding user response:
Use DataFrameGroupBy.idxmax
for indices by first maximal by column hasWon
, select rows and convert MultiIndex
to DataFrame
by MultiIndex.to_frame
:
df = df.loc[df.groupby(level='fullname')['hasWon'].idxmax()].index.to_frame(index=False)
print (df)
fullname surface
0 Zuzana Zlochova Hard
1 Zuzanna Bednarz Clay
2 Zuzanna Szczepanska Clay
3 Zvonimir Oreskovic Hard
Or convert tuples to DataFrame
in constructor:
df = pd.DataFrame(df.groupby('fullname')['hasWon'].idxmax().tolist(),
columns=['fullname','best_surface'])
print (df)
fullname best_surface
0 Zuzana Zlochova Hard
1 Zuzanna Bednarz Clay
2 Zuzanna Szczepanska Clay
3 Zvonimir Oreskovic Hard