Home > Enterprise >  Creating a new column based on if-else condition using several other columns
Creating a new column based on if-else condition using several other columns

Time:02-28

if data['weight'] < 50 | data['blood_diseases'] == 1 | data['age'] > 65 | data['hemoglobin_level'] < 12 | data['period_between_successive_blood_donations'] < 3:
    data['can'] = 0
else:
    data['can'] = 1 

I have an error

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can I do it?

CodePudding user response:

Please add brackets to your if-else statement. It will solve your problem. like this:

if  (data['weight'] < 50)|(data['blood_diseases'] == 1)|(data['age'] > 65)|  (data['hemoglobin_level'] < 12) | (data['period_between_successive_blood_donations'] < 3):
    data['can'] = 0
else:
    data['can'] = 1 

or

if  (data['weight'] < 50) or (data['blood_diseases'] == 1) or (data['age'] > 65) or  (data['hemoglobin_level'] < 12) or (data['period_between_successive_blood_donations'] < 3):
    data['can'] = 0
else:
    data['can'] = 1 

CodePudding user response:

  1. Don't use if/else. The expression if a where a is a Series does not make sense, because there are several possible ways to interpret a Series as a single boolean value. This is one reason for getting the error you see.

  2. You also need to enforce the desired evaluation order by using parentheses:

    • not only will data['weight'] < 50 | data['blood_diseases'] evaluate as data['weight'] < (50 | data['blood_diseases']), because | has a higher precedence than <,

    • but also data['weight'] < 50 | data['blood_diseases'] == 1 will evaluate as (data['weight'] < (50 | data['blood_diseases'])) and ((50 | data['blood_diseases']) == 1) due to chaining of comparison operators.

      This again will try to interpret two Series as single boolean values in an expression a and b which is the second reason for getting the error.

  3. Don't use data['can'] = 0 or data['can'] = 1. It will assign a constant value to the column, instead of evaluating the other columns row-wise.

Just assign the expression to the new column.

data['can'] = (data['weight'] < 50) | (data['blood_diseases'] == 1) | (data['age'] > 65) | (data['hemoglobin_level'] < 12) | (data['period_between_successive_blood_donations'] < 3)

If you really need 0/1 instead of boolean values, you can convert them:

How can I map True/False to 1/0 in a Pandas DataFrame?

CodePudding user response:

Use np.where

import numpy as np

mask = (data['weight'] < 50) | (data['blood_diseases'] == 1) | \
(data['age'] > 65) | (data['hemoglobin_level'] < 12) | \
(data['period_between_successive_blood_donations'] < 3)
data["can"] = np.where(mask,1,0)
  • Related