Home > Software design >  How to Compare Multiple Columns, and Produce Values in single New Column , Using Apply Function in P
How to Compare Multiple Columns, and Produce Values in single New Column , Using Apply Function in P

Time:12-23

Using the Apply Function in Pandas, I want to compare Multiple Columns in a Datafarme , to see if there values are Higher or Lower than a Numerical Value. Than Based on the Result of the Condition If Higher or Lower, i will output a String Value in a New Column. I'am able to do this when comparing 1 Column to the Numerical Value, but not with Multiple Columns. How would i do this with Multiple Columns ? Below is Example i'm using. The example works well for 1 Column , but i cannot get it done for multiple columns. In addition to "Column C" which im comparing to the numerical Value "99". I want to also compare "Columns B" and "Columns D" to the numerical Value "99".

(Note: I do not want to use Lambda function method)

Code is below,

import pandas as pd
import numpy as np
data = { 'a': [1, 15, 27, 399], 
         'b': [2, 30, 45, 60],
         'c': [100,200, 3, 78],
         'd': [4, 300, 400, 500]
         }

dfgrass = pd.DataFrame(data)
def judge(x):
    if x > 99:
        return 'bingo'
    elif x < 99:
        return 'jack'

dfgrass['e'] = dfgrass['c'].apply(judge)

print(dfgrass)

CodePudding user response:

Try this:

df['e'] = np.where(df[['a','b','c']].gt(99).any(axis=1), 'jack', 'bingo')

Output:

>>> df
     a   b    c    d      e
0    1   2  100    4  bingo
1   15  30  200  300  bingo
2   27  45    3  400   jack
3  399  60   78  500   jack

CodePudding user response:

To use your function with multiple columns you need 2 things:

  1. Use axis=1 as parameter of apply to pass each rows to your function else your function receive columns.

  2. Inside your function if you use conditional statements you have to use any or all (or anything else like sum) to aggregate / reduce the boolean vectors else your function will raise the well known ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You can modify your function as below:

def judge(x):
    if any(x > 99):  # <- HERE
        return 'bingo'
    else:
        return 'jack'

dfgrass['e'] = dfgrass[['b', 'c', 'd']].apply(judge, axis=1)  # <- HERE
print(df)

# Output:
     a   b    c    d      e
0    1   2  100    4  bingo
1   15  30  200  300  bingo
2   27  45    3  400  bingo
3  399  60   78  500  bingo
  • Related