Home > OS >  Columns names of top n values of each row in a pandas DataFrame
Columns names of top n values of each row in a pandas DataFrame

Time:03-03

Data:
d = {'a': [1,5], 'b': [2,4], 'c': [3,3], 'd':[4,2], 'e': [5,1]}
df = pd.DataFrame(d)

Desired Output:

d2 = {'a': [1,5], 'b': [2,4], 'c': [3,3], 'd':[4,2], 'e': [5,1], 'Top (First)': ['e','a'], 'Top (Second)': ['d','b'], 'Top (Third)': ['c','c']}
df2 = pd.DataFrame(d2)

I would like to create 3 columns - Top (First), Top (Second), Top (Third) that return the column names of the top 3 highest values for that row. I tried an np.where and even though that works, it becomes exponentially longer with more columns so looking for a simpler solution.

CodePudding user response:

You could use nlargest to find the 3 largest values, then get the index of the largest values (which are column names since we apply nlargest row-wise) and build DataFrame and join it back to df:

df2 = df.join(pd.DataFrame(df.apply(lambda x: x.nlargest(3).index.tolist(), axis=1).tolist(), 
                           columns=['Top (First)','Top (Second)','Top (Third)']))

Output:

   a  b  c  d  e Top (First) Top (Second) Top (Third)
0  1  2  3  4  5           e            d           c
1  5  4  3  2  1           a            b           c

CodePudding user response:

enke's answer is good enough solving the question within one library. Here's how you do with Numpy if you are feeling spicy lol.

import numpy as np

pd.concat([df,pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:3]]), axis=1).to_list(),  columns=['Top (First)', 'Top (Second)', 'Top (Third)'])], axis=1)

   a  b  c  d  e Top (First) Top (Second) Top (Third)
0  1  2  3  4  5           e            d           c
1  5  4  3  2  1           a            b           c
  • Related