Data:
d = {'a': [1,5], 'b': [2,4], 'c': [3,3], 'd':[4,2], 'e': [5,1]}
df = pd.DataFrame(d)
Desired Output:
d2 = {'a': [1,5], 'b': [2,4], 'c': [3,3], 'd':[4,2], 'e': [5,1], 'Top (First)': ['e','a'], 'Top (Second)': ['d','b'], 'Top (Third)': ['c','c']}
df2 = pd.DataFrame(d2)
I would like to create 3 columns - Top (First)
, Top (Second)
, Top (Third)
that return the column names of the top 3 highest values for that row. I tried an np.where
and even though that works, it becomes exponentially longer with more columns so looking for a simpler solution.
CodePudding user response:
You could use nlargest
to find the 3 largest values, then get the index of the largest values (which are column names since we apply nlargest
row-wise) and build DataFrame and join
it back to df
:
df2 = df.join(pd.DataFrame(df.apply(lambda x: x.nlargest(3).index.tolist(), axis=1).tolist(),
columns=['Top (First)','Top (Second)','Top (Third)']))
Output:
a b c d e Top (First) Top (Second) Top (Third)
0 1 2 3 4 5 e d c
1 5 4 3 2 1 a b c
CodePudding user response:
enke's answer is good enough solving the question within one library. Here's how you do with Numpy if you are feeling spicy lol.
import numpy as np
pd.concat([df,pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:3]]), axis=1).to_list(), columns=['Top (First)', 'Top (Second)', 'Top (Third)'])], axis=1)
a b c d e Top (First) Top (Second) Top (Third)
0 1 2 3 4 5 e d c
1 5 4 3 2 1 a b c