Home > Net >  Issue with multiple conditionals within a lambda function being applied to multiple columns
Issue with multiple conditionals within a lambda function being applied to multiple columns

Time:10-21

I am attempting to populate a column titled 'label' which is the result of conditional statements within a lambda function which involves two columns of the data frame. I would like to create numerical labels based off of specific conditions found within the 'WY' and 'WY Week' columns. For example the label is 1 if WY is less than 2010 and 2 if WY is greater than 2010 and 3 if the WY value is greater than 2010 for WY Week values between 26 and 40.

I dont have an issue with one conditional for one column as seen below:

GC['label'] = GC['WY'].apply(lambda x: 1 if x >= 1985 else 0)

But I throw a code when I attempt to write a conditional statement involving two columns and multiple conditions:

CJ['label'] = CJ[['WY','WY Week']].apply(lambda x,y: 1 if x < 2010 else (2 if x >= 2010 and (y >= 26 and y <= 40)) else )

The error is a syntax error:

File "<ipython-input-21-6b6fa416588d>", line 7
CJ['label'] = CJ[['WY','WY Week'].apply(lambda x,y: 1 if x < 2010 else (2 if x >= 2010) and (y >= 26 and y <= 40) else )
                                                                                      ^
SyntaxError: invalid syntax

I feel like i'm pretty close but would like some assistance as it is 1 of several conditional statements that I need to write like this.

CodePudding user response:

Define a named function instead of trying to cram everything into a complex lambda.

There's no need to test x >= 2010 in the else; if it gets to the else, that must be true.

def labelval(x, y):
    if x < 2010:
        return 1
    elif 26 <= y <= 40:
        return 2
    else:
        return 3

CJ['label'] = CJ[['WY','WY Week']].apply(labelval)

CodePudding user response:

# hopefully a readable function that makes label conditions clear
def classify(wy, wy_week):
    if wy < 2020:
        return 1
    elif 26 <= wy_week <= 40
        return 2
    else:
        return 3 # I guess?

# fast, vectorized calculation using two columns
GB['label'] = list(map(classify,GC['WY'],GC['WY Week']))

One of my favorite best stack overflow answers ever: Performance of Pandas apply vs np.vectorize to create new column from existing columns

  • Related