I have a pandas dataframe with many columns. Now I want to group by values of a column and then return max
of another specified column, and also return the values of the remaining columns corresponding to the rows where that max
occurred. Below is one example
import pandas as pd
dat = pd.DataFrame({'A' : ['Y', 'Y', 'Z'] , 'X1' : [10, 20, 30], 'X2' : [11, 10, 20], 'X3' : [20, 15, 60]})
dat.groupby('A').agg({'X1' : 'max'})
X1
A
Y 20
Z 30
As you see it did not return other columns i.e. X2, X3
etc. I want to get values of those column where the max
occurred i.e. {10, 15} and {20, 60}
. In column X1
, there are no repeated values within a group. In my actual dataframe I have many columns so a general approach would be very helpful
CodePudding user response:
idxmax
returns the index where the max
appears, so we can make use of that by taking the whole row where X1
's max is in each group.
df.loc[df.groupby('A')['X1'].idxmax()]
Output:
A X1 X2 X3
1 Y 20 10 15
2 Z 30 20 60