I'm trying to find, hopefully, a one lines to accomplish the following:
I have the following dataframe:
import pandas as pd
import numpy as np
SIZE = 10
df = pd.DataFrame({'col1': np.random.randint(100, size=SIZE),
'col2': np.random.randint(100, size=SIZE),
'col3': np.random.randint(100, size=SIZE),
'col4': np.random.randint(2, size=SIZE)})
print(df)
outputting
col1 col2 col3 col4
0 55 96 40 0
1 82 59 34 1
2 85 66 25 1
3 90 69 27 0
4 36 32 79 1
5 33 69 80 1
6 11 53 88 0
7 31 51 96 0
8 89 76 88 1
9 4 76 47 0
I'm currently ignoring col4
and calculating the max value of each row as follows:
df[['col1', 'col2', 'col3']].max(axis=1)
resulting in
0 96
1 82
2 85
3 90
4 79
5 80
6 88
7 96
8 89
9 76
dtype: int64
I want to use col4
to conditionally calculate the max value. If col4
value is 0, calculate max value of col1
, else calculate max value of ['col2', 'col3']
. I also want to keep the same index/order of the dataframe.
The end result would be
0 55 # col1
1 59 # max(col2, col3)
2 66 # max(col2, col3)
3 90 # col1
4 79 # max(col2, col3)
5 80 # max(col2, col3)
6 11 # col1
7 31 # col1
8 88 # max(col2, col3)
9 4 # col1
dtype: int64
One possibility would be to create two new dataframes, calculate the max, and join them again, but this would possibly mess the index (I guess I could save that too). Any better ideas?
Apologies if this question was already asked, but I couldn't find with the search terms
CodePudding user response:
A vectorial way would be:
out = df['col1'].where(df['col4'].eq(0), df[['col2', 'col3']].max(axis=1))
Or:
out = df[['col2', 'col3']].max(axis=1)
out.loc[df['col4'].eq(0)] = df['col1']
output:
0 55
1 59
2 66
3 90
4 79
5 80
6 11
7 31
8 88
9 4
Name: col1, dtype: int64
CodePudding user response:
There might be a better option... but this does the job by simply applying your rule as a lambda row-wise:
df.apply(lambda x: x[["col2", "col3"]].max() if x["col4"] else x["col1"], axis=1)