If I have a data frame consisting of the following values (exact values don't matter):
import pandas as pd
import bumpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(5, 4)), columns=list('ABCD'))
df
How do I add a fifth column 'E' and have the values in column E compare the value A to values B,C,D? I want to have the result be 1 if Column A is greater than the max value of B, C, D column values and 0 if Column A is less than the max value of B, C, D column values.
I tried the following:
df['E']= np.where( df['A'] > max(df['B'],df['C'],df['D'], 1, 0)
I receive the following error:ValueError:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Thanks in advance!
CodePudding user response:
here is one way to do it, using pandas max
df['E']=np.where(df['A']> df[['B','C','D']].max(axis=1),
1,
0)
df
A B C D E
0 92 23 7 68 1
1 23 79 79 38 0
2 66 19 29 92 0
3 13 40 4 36 0
4 39 28 51 90 0