Home > Mobile >  Find highest two numbers on every row in pandas dataframe and extract the column names
Find highest two numbers on every row in pandas dataframe and extract the column names

Time:12-09

I have a code with multiple columns and I would like to add two more, one for the highest number on the row, and another one for the second highest. However, instead of the number, I would like to show the column name where they are found.

Assume the following data frame:

import pandas as pd

df = pd.DataFrame({'A': [1, 5, 10], 'B': [2, 6, 11], 'C': [3, 7, 12], 'D': [4, 8, 13], 'E': [5, 9, 14]})

To extract the highest number on every row, I can just apply max(axis=1) like this:

df['max1'] = df[['A', 'B', 'C', 'D', 'E']].max(axis = 1)

This gets me the max number, but not the column name itself.

How can this be applied to the second max number as well?

CodePudding user response:

You can sorting values and assign top2 values:

cols = ['A', 'B', 'C', 'D', 'E']
df[['max2','max1']] = np.sort(df[cols].to_numpy(), axis=1)[:, -2:]
print (df)
    A   B   C   D   E  max2  max1
0   1   2   3   4   5     4     5
1   5   6   7   8   9     8     9
2  10  11  12  13  14    13    14

df[['max1','max2']] = np.sort(df[cols].to_numpy(), axis=1)[:, -2:][:, ::-1]

EDIT: For get top2 columns names and top2 values use:

df = pd.DataFrame({'A': [1, 50, 10], 'B': [2, 6, 11], 
                   'C': [3, 7, 12], 'D': [40, 8, 13], 'E': [5, 9, 14]})

cols = ['A', 'B', 'C', 'D', 'E']

#values in numpy array
vals = df[cols].to_numpy()

#columns names in array
cols = np.array(cols)

#get indices that would sort an array in descending order
arr = np.argsort(-vals, axis=1)

#top 2 columns names
df[['top1','top2']] = cols[arr[:, :2]]

#top 2 values
df[['max2','max1']] = vals[np.arange(arr.shape[0])[:, None], arr[:, :2]]
print (df)
    A   B   C   D   E top1 top2  max2  max1
0   1   2   3  40   5    D    E    40     5
1  50   6   7   8   9    A    E    50     9
2  10  11  12  13  14    E    D    14    13

CodePudding user response:

Another approaches to you can get first max then remove it and get max again to get the second max

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 15, 10], 'B': [2, 89, 11], 'C': [80, 7, 12], 'D': [4, 8, 13], 'E': [5, 9, 14]})
        
max1=df.max(axis=1)

maxcolum1=df.idxmax(axis=1)

max2 = df.replace(np.array(df.max(axis=1)),0).max(axis=1)

maxcolum2=df.replace(np.array(df.max(axis=1)),0).idxmax(axis=1)

df2 =pd.DataFrame({ 'max1': max1, 'max2': max2 ,'maxcol1':maxcolum1,'maxcol2':maxcolum2 })

df.join(df2)
  • Related