Home > Enterprise >  Get names of column with max value (n_largest) for every row on pandas dataframe
Get names of column with max value (n_largest) for every row on pandas dataframe

Time:12-06

I want to output 3 column names that have the 3 maximum values for each pandas row.

For example:

import pandas as pd

# read data
data = pd.DataFrame({
    "carbohydrates": [42, 38, 39],
    "Proteins": [50, 40, 45],
    "Vegetables": [34, 98, 67],
    "Fats and Oil": [76, 45, 90],
    "Vitamins": [67, 23, 98],
    "Minerals": [65, 23, 99]

df

|    |   carbohydrates |   Proteins |   Vegetables |   Fats and Oil |   Vitamins |   Minerals |
|---:|----------------:|-----------:|-------------:|---------------:|-----------:|-----------:|
|  0 |              42 |         50 |           34 |             76 |         67 |         65 |
|  1 |              38 |         40 |           98 |             45 |         23 |         23 |
|  2 |              39 |         45 |           67 |             90 |         98 |         99 |


But I want to return:

[["Fats and Oil", "Vitamin", "Minerals"], 
["Vegetables", "Fats and Oil", "Proteins"], 
["Minerals", "Vitamins", "Fats and Oil"]]

Please pardon my text formatting. It is not clear to me how oto make it nice.

CodePudding user response:

Here is one way to get the (N) largest of each row :

N = 3

list_out = [data.iloc[row].nlargest(N).to_frame().T.columns.tolist() for row in range(0,len(data))]

# Output :

print(list_out)

[['Fats and Oil', 'Vitamins', 'Minerals'],
 ['Vegetables', 'Fats and Oil', 'Proteins'],
 ['Minerals', 'Vitamins', 'Fats and Oil']]
  • Related