Home > Enterprise >  Why is my if statement not returning the return statement?
Why is my if statement not returning the return statement?

Time:03-09

I have a dataframe with multiple columns I want to check against one another. In this function, I have several if statements, below:

def bucketing(row):
    if row['NATIONALITY'] == 'RU' and row['party_other_nationality'] in other_party:
        return 'low risk'
    elif row['NATIONALITY'] == 'RU' and row['party_other_nationality'] not in other_party:
        return 'high risk'    
    
    if row['CTRY_RSD'] == 'RU' and row['NATIONALITY'] != 'RU' and row['CTRY_DMCL'] != 'RU' and row['INCORP_CNTRY'] != 'RUSSIA' and row['SOURCEOFWEALTH'] != 'RU':
        return 'low risk'
    else:
        return 'high risk'
    
    if row['CTRY_DMCL'] == 'RU' and row['NATIONALITY'] != 'RU' and row['CTRY_RSD'] != 'RU' and row['INCORP_CNTRY'] != 'RUSSIA' and row['SOURCEOFWEALTH'] != 'RU':
        return 'low risk'
    else:
        return 'high risk'
    
    if row['INCORP_CNTRY'] == 'RUSSIA' and row['NATIONALITY'] != 'RU' and row['CTRY_RSD'] != 'RU' and row['CTRY_DMCL'] != 'RU' and row['SOURCEOFWEALTH'] != 'RU':
        return 'low risk'
    else:
        return 'high risk'
    
    if row['SOURCEOFWEALTH'] == 'RU' and row['NATIONALITY'] != 'RU' and row['CTRY_RSD'] != 'RU' and row['CTRY_DMCL'] != 'RU' and row['INCORP_CNTRY'] != 'RUSSIAN FEDERATION':
        return 'low risk'
    elif row['SOURCEOFWEALTH'] == 'RU' and row['NATIONALITY'] == 'RU' or row['CTRY_RSD'] == 'RU' or row['CTRY_DMCL'] == 'RU' or row['INCORP_CNTRY'] == 'RUSSIAN FEDERATION':
        return 'high risk'
    elif row['SOURCEOFWEALTH'] != 'RU' and row['NATIONALITY'] != 'RU' and row['CTRY_RSD'] != 'RU' and row['CTRY_DMCL'] != 'RU' and row['INCORP_CNTRY'] != 'RUSSIAN FEDERATION':
        return 'No material link'

For each one, my understanding is as follows: where I have written 'else', it would return 'high risk' if any of the second conditions after the 'and' is not met.

Example:

if row['CTRY_DMCL'] == 'RU' and row['NATIONALITY'] != 'RU' and row['CTRY_RSD'] != 'RU' and row['INCORP_CNTRY'] != 'RUSSIA' and row['SOURCEOFWEALTH'] != 'RU':
        return 'low risk'
    else:
        return 'high risk'

I would like this to return 'high risk', if the CTRY_DMCL column has 'RU' and any of the other columns have RU or RUSSIA.

The one that is not working is below:

   if row['SOURCEOFWEALTH'] == 'RU' and row['NATIONALITY'] != 'RU' and row['CTRY_RSD'] != 'RU' and row['CTRY_DMCL'] != 'RU' and row['INCORP_CNTRY'] != 'RUSSIAN FEDERATION':
        return 'low risk'
    elif row['SOURCEOFWEALTH'] == 'RU' and row['NATIONALITY'] == 'RU' or row['CTRY_RSD'] == 'RU' or row['CTRY_DMCL'] == 'RU' or row['INCORP_CNTRY'] == 'RUSSIAN FEDERATION':
        return 'high risk'
    elif row['SOURCEOFWEALTH'] != 'RU' and row['NATIONALITY'] != 'RU' and row['CTRY_RSD'] != 'RU' and row['CTRY_DMCL'] != 'RU' and row['INCORP_CNTRY'] != 'RUSSIAN FEDERATION':
        return 'No material link'

I do not get a low risk statement returned, if only 'SOURCEOFWEALTH' has a Russian value but nothing else does. This returns 'high risk'.

Could it be how I structured my function?

This is how I am running the function against my dataframe: merged['NEW COLUMN'] = merged.apply(bucketing, axis=1)

I am now wondering if how I have written my if statements capture all of the conditions, or the very last one in each statement?

Thanks

CodePudding user response:

Generally, in programming languages, whether it is Python, SQL, or others, whenever you combine or and and logical conditions, use parentheses to separate the conditions. If all conditions are joined with and types, then you would not need parentheses. Likely your second clause returns all conditions since it contains the mix of both types. See adjustment below:

    if row['SOURCEOFWEALTH'] == 'RU' and row['NATIONALITY'] != 'RU' and row['CTRY_RSD'] != 'RU' and row['CTRY_DMCL'] != 'RU' and row['INCORP_CNTRY'] != 'RUSSIAN FEDERATION':
        return 'low risk'
    elif (row['SOURCEOFWEALTH'] == 'RU' and row['NATIONALITY'] == 'RU') or (row['CTRY_RSD'] == 'RU') or (row['CTRY_DMCL'] == 'RU') or (row['INCORP_CNTRY'] == 'RUSSIAN FEDERATION'):
        return 'high risk'
    elif row['SOURCEOFWEALTH'] != 'RU' and row['NATIONALITY'] != 'RU' and row['CTRY_RSD'] != 'RU' and row['CTRY_DMCL'] != 'RU' and row['INCORP_CNTRY'] != 'RUSSIAN FEDERATION':
        return 'No material link'

However, avoid rowwise conditional checks via Series.apply (hidden loop) for more vectorized operations like numpy.select to assign entire column at once and not row by row. See draft below to be adjusted, completed, and tested:

# LIST OF BOOLEAN ARRAY CONDITIONS (IDEALLY MUTUALLY EXCLUSIVE)
conditions = [
    (mydata['NATIONALITY'] == 'RU') & (mydata['party_other_nationality'].isin(other_party)),
    (mydata['NATIONALITY'] == 'RU') & (~mydata['party_other_nationality'].isin(other_party)),
    (mydata['CTRY_RSD'] == 'RU') & (mydata['NATIONALITY'] != 'RU') & (mydata['CTRY_DMCL'] != 'RU') & (mydata['INCORP_CNTRY'] != 'RUSSIA') & (mydata['SOURCEOFWEALTH'] != 'RU'),
    (mydata['CTRY_DMCL'] == 'RU') & (mydata['NATIONALITY'] != 'RU') & (mydata['CTRY_RSD'] != 'RU') & (mydata['INCORP_CNTRY'] != 'RUSSIA') & (mydata['SOURCEOFWEALTH'] != 'RU'),
    (mydata['SOURCEOFWEALTH'] == 'RU') & (mydata['NATIONALITY'] != 'RU') & (mydata['CTRY_RSD'] != 'RU') & (mydata['CTRY_DMCL'] != 'RU') & (mydata['INCORP_CNTRY'] != 'RUSSIAN FEDERATION')
     ...
]

# LIST OF RETURN VALUES (EQUAL LENGTH AS CONDITIONS)
choices = [
    'low risk', 'high risk', 'low risk', 'low risk', 'low risk', ...
]

mydata["new_risk_column"] = np.select(conditions, choices, default="No material link")
  • Related