Using the Apply Function in Pandas, I want to compare Multiple Columns in a Datafarme , to see if there values are Higher or Lower than a Numerical Value. Than Based on the Result of the Condition If Higher or Lower, i will output a String Value in a New Column. I'am able to do this when comparing 1 Column to the Numerical Value, but not with Multiple Columns. How would i do this with Multiple Columns ? Below is Example i'm using. The example works well for 1 Column , but i cannot get it done for multiple columns. In addition to "Column C" which im comparing to the numerical Value "99". I want to also compare "Columns B" and "Columns D" to the numerical Value "99".
(Note: I do not want to use Lambda function method)
Code is below,
import pandas as pd
import numpy as np
data = { 'a': [1, 15, 27, 399],
'b': [2, 30, 45, 60],
'c': [100,200, 3, 78],
'd': [4, 300, 400, 500]
}
dfgrass = pd.DataFrame(data)
def judge(x):
if x > 99:
return 'bingo'
elif x < 99:
return 'jack'
dfgrass['e'] = dfgrass['c'].apply(judge)
print(dfgrass)
CodePudding user response:
Try this:
df['e'] = np.where(df[['a','b','c']].gt(99).any(axis=1), 'jack', 'bingo')
Output:
>>> df
a b c d e
0 1 2 100 4 bingo
1 15 30 200 300 bingo
2 27 45 3 400 jack
3 399 60 78 500 jack
CodePudding user response:
To use your function with multiple columns you need 2 things:
Use
axis=1
as parameter ofapply
to pass each rows to your function else your function receive columns.Inside your function if you use conditional statements you have to use
any
orall
(or anything else likesum
) to aggregate / reduce the boolean vectors else your function will raise the well knownValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You can modify your function as below:
def judge(x):
if any(x > 99): # <- HERE
return 'bingo'
else:
return 'jack'
dfgrass['e'] = dfgrass[['b', 'c', 'd']].apply(judge, axis=1) # <- HERE
print(df)
# Output:
a b c d e
0 1 2 100 4 bingo
1 15 30 200 300 bingo
2 27 45 3 400 bingo
3 399 60 78 500 bingo