Home > database >  Pandas - Conditionally finding max of row according to column value while maintaining index order
Pandas - Conditionally finding max of row according to column value while maintaining index order

Time:11-22

I'm trying to find, hopefully, a one lines to accomplish the following:

I have the following dataframe:

import pandas as pd
import numpy as np

SIZE = 10
df = pd.DataFrame({'col1': np.random.randint(100, size=SIZE),
                   'col2': np.random.randint(100, size=SIZE),
                   'col3': np.random.randint(100, size=SIZE),
                   'col4': np.random.randint(2, size=SIZE)})
print(df)

outputting

   col1  col2  col3  col4
0    55    96    40     0
1    82    59    34     1
2    85    66    25     1
3    90    69    27     0
4    36    32    79     1
5    33    69    80     1
6    11    53    88     0
7    31    51    96     0
8    89    76    88     1
9     4    76    47     0

I'm currently ignoring col4 and calculating the max value of each row as follows:

df[['col1', 'col2', 'col3']].max(axis=1)

resulting in

0    96
1    82
2    85
3    90
4    79
5    80
6    88
7    96
8    89
9    76
dtype: int64

I want to use col4 to conditionally calculate the max value. If col4 value is 0, calculate max value of col1, else calculate max value of ['col2', 'col3']. I also want to keep the same index/order of the dataframe.

The end result would be

0    55    # col1
1    59    # max(col2, col3)
2    66    # max(col2, col3)
3    90    # col1
4    79    # max(col2, col3)
5    80    # max(col2, col3)
6    11    # col1
7    31    # col1
8    88    # max(col2, col3)
9    4     # col1
dtype: int64

One possibility would be to create two new dataframes, calculate the max, and join them again, but this would possibly mess the index (I guess I could save that too). Any better ideas?

Apologies if this question was already asked, but I couldn't find with the search terms

CodePudding user response:

A vectorial way would be:

out = df['col1'].where(df['col4'].eq(0), df[['col2', 'col3']].max(axis=1))

Or:

out = df[['col2', 'col3']].max(axis=1)
out.loc[df['col4'].eq(0)] = df['col1']

output:

0    55
1    59
2    66
3    90
4    79
5    80
6    11
7    31
8    88
9     4
Name: col1, dtype: int64

CodePudding user response:

There might be a better option... but this does the job by simply applying your rule as a lambda row-wise:

df.apply(lambda x: x[["col2", "col3"]].max() if x["col4"] else x["col1"], axis=1)
  • Related