Home > database >  Select value of a column that corresponds to highest value of another column with groupby in pandas
Select value of a column that corresponds to highest value of another column with groupby in pandas

Time:03-02

I have the following pandas dataframe

import pandas as pd
foo = pd.DataFrame({'id': [1,1,2,2], 'perc':[0.1,0.2,0.3,0.4], 'category':['a','b','b','a']})

I would like to create an extra column which will be the category that corresponds to the highest perc by id.

The resulting dataframe looks like:

foo = pd.DataFrame({'id': [1,1,2,2], 'perc':[0.1,0.2,0.3,0.4], 'category':['a','b','b','a'], 'category_max':['b','b','a','a']})

Any ideas how I can do that ?

CodePudding user response:

Use GroupBy.transform with convert category to index, so for maximal caegory by perc per group use DataFrameGroupBy.idxmax, last for assign back is necessary convert ouput to numpy array for prevent alignment (because is different index after transform like in original df):

foo['category_max'] = (foo.set_index('category')
                          .groupby('id')['perc']
                          .transform('idxmax')
                          .to_numpy())
print (foo)
   id  perc category category_max
0   1   0.1        a            b
1   1   0.2        b            b
2   2   0.3        b            a
3   2   0.4        a            a

Or is possible mapping by Series.map by aggregate values by DataFrameGroupBy.idxmax:

foo['category_max'] = foo['id'].map(foo.set_index('category')
                                       .groupby('id')['perc']
                                       .idxmax())
print (foo)
   id  perc category category_max
0   1   0.1        a            b
1   1   0.2        b            b
2   2   0.3        b            a
3   2   0.4        a            a

CodePudding user response:

You could use a GroupBy.transform('idxmax'):

foo['category_max'] = (foo
                       .set_index('category')
                       .groupby('id')['perc']
                       .transform('idxmax').values
                      )

output:

   id  perc category category_max
0   1   0.1        a            b
1   1   0.2        b            b
2   2   0.3        b            a
3   2   0.4        a            a
  • Related