Home > Software design >  Get column index of max value in pandas row
Get column index of max value in pandas row

Time:12-19

I want to find not just the max value in a dataframe row, but also the specific column that has that value. If there are multiple columns with the value, then either returning the list of all columns, or just one, are both fine.

In this case, I'm specifically concerned with doing this for a single given row, but if there is a solution that can apply to a dataframe, that would be great as well.

Below is a rough idea of what I mean. row.max() returns the max value, but my desired function row.max_col() returns the name of the column that has the max value.

>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6]})
>>> row = df.iloc[0]
>>> row.max()
4
>>> row.max_col()
Index(['B'], dtype='object')

My current approach is this:

>>> row.index[row.eq(row.max())]
Index(['B'], dtype='object')

I'm not familiar with how pandas optimizes everything so I apologize if I'm wrong here, but I assume that row.index[row.eq(...)] grows in linear time proportional to the number of columns. I'm working with a small number of columns, so it shouldn't be a huge issue, but I'm curious if there is a way to get the column name the same way that I can use .max() without having to do the extra work afterwards to look for equal values.

CodePudding user response:

Use idxmax:

>>> df
   A  B
0  1  4
1  2  5
2  3  6

>>> df.iloc[0].idxmax()
'B'

CodePudding user response:

Assume that the source DataFrame contains:

   A  B
0  1  4
1  7  5
2  3  6
3  9  8

Then, to find the column name holding the max value in each row (not only row 0), run:

result = df.apply('idxmax', axis=1)

The result is:

0    B
1    A
2    B
3    A
dtype: object

But if you want to get the integer index of the column holding the max value, change the above code to:

result = df.columns.get_indexer(df.apply('idxmax', axis=1))

This time the result is:

array([1, 0, 1, 0], dtype=int64)
  • Related