I have the following pandas data frame:
AA BB CC DD EE
----------------------------------
0 1 12 4 3 5
1 5 7 28 7 4
2 9 7 9 2 6
I would like to add a new column ("MM") and set it to be a list of the column names of the two largest values in each row, for the above data frame, the output should be:
AA BB CC DD EE MM
-------------------------------------------------
0 1 12 4 3 5 ['BB','EE']
1 5 7 28 7 4 ['CC','DD','BB']
2 9 7 9 2 6 ['AA','CC']
in the first row, the two largest values are: 12,5 (column 'BB' and 'EE')
How can I do that ?
Thanks
CodePudding user response:
You can use apply
with nlargest
and the keep='all'
parameter to keep the duplicates:
df['MM'] = df.apply(lambda r: r.nlargest(2, keep='all').index.values, axis=1)
output:
AA BB CC DD EE MM
0 1 12 4 3 5 [BB, EE]
1 5 7 28 7 4 [CC, BB, DD]
2 9 7 9 2 6 [AA, CC]