Get the columns name of the two largest values from pandas dataframe rows-CodePudding

I have the following pandas data frame:

    AA    BB    CC    DD    EE
----------------------------------
0   1     12    4      3     5
1   5     7     28     7     4
2   9     7     9      2     6

I would like to add a new column ("MM") and set it to be a list of the column names of the two largest values in each row, for the above data frame, the output should be:

    AA    BB    CC    DD    EE    MM
-------------------------------------------------
0   1     12    4      3     5    ['BB','EE']
1   5     7     28     7     4    ['CC','DD','BB']
2   9     7     9      2     6    ['AA','CC']

in the first row, the two largest values are: 12,5 (column 'BB' and 'EE')

How can I do that ?

Thanks

CodePudding user response：

You can use apply with nlargest and the keep='all' parameter to keep the duplicates:

df['MM'] = df.apply(lambda r: r.nlargest(2, keep='all').index.values, axis=1)

output:

   AA  BB  CC  DD  EE            MM
0   1  12   4   3   5      [BB, EE]
1   5   7  28   7   4  [CC, BB, DD]
2   9   7   9   2   6      [AA, CC]