I want to output 3 column names that have the 3 maximum values for each pandas row.
For example:
import pandas as pd
# read data
data = pd.DataFrame({
"carbohydrates": [42, 38, 39],
"Proteins": [50, 40, 45],
"Vegetables": [34, 98, 67],
"Fats and Oil": [76, 45, 90],
"Vitamins": [67, 23, 98],
"Minerals": [65, 23, 99]
df
| | carbohydrates | Proteins | Vegetables | Fats and Oil | Vitamins | Minerals |
|---:|----------------:|-----------:|-------------:|---------------:|-----------:|-----------:|
| 0 | 42 | 50 | 34 | 76 | 67 | 65 |
| 1 | 38 | 40 | 98 | 45 | 23 | 23 |
| 2 | 39 | 45 | 67 | 90 | 98 | 99 |
But I want to return:
[["Fats and Oil", "Vitamin", "Minerals"],
["Vegetables", "Fats and Oil", "Proteins"],
["Minerals", "Vitamins", "Fats and Oil"]]
Please pardon my text formatting. It is not clear to me how oto make it nice.
CodePudding user response:
Here is one way to get the (N
) largest
of each row :
N = 3
list_out = [data.iloc[row].nlargest(N).to_frame().T.columns.tolist() for row in range(0,len(data))]
# Output :
print(list_out)
[['Fats and Oil', 'Vitamins', 'Minerals'],
['Vegetables', 'Fats and Oil', 'Proteins'],
['Minerals', 'Vitamins', 'Fats and Oil']]