How to apply a function across two columns in pandas?-CodePudding

I am writing a function to find if two columns satisfy a condition, and if so, I want to return a new column with a statement. I thought I could just do df.apply(function), but it does not seem to work!

def bucketing(df):
    if df['NATIONALITY'] == 'RU' and df['CTRY_OF_RESIDENCE'] == 'Russia':
        return 'High Risk'

merged.apply(bucketing, axis = 1)

This is my error:

TypeError: unsupported operand type(s) for |: 'str' and 'str'

My expected output would be a new column with the string 'High Risk' returned if the above condition is met.

Is there a more efficient way of doing this?

Thanks

CodePudding user response：

Here is an easier way:

import numpy as np
df['new col'] = np.where((df['NATIONALITY'] == 'RU') & (df['CTRY_OF_RESIDENCE'] == 'Russia'), 'High Risk', np.where((df['NATIONALITY'] == 'UK') & (df['CTRY_OF_RESIDENCE'] == 'Ukraine'), 'Medium Risk', ''))

CodePudding user response：

If you want to still utilize your code I think this would work but a sample DF would help to check

def bucketing(row):
    if row['NATIONALITY'] == 'RU' & row['CTRY_OF_RESIDENCE'] == 'Russia':
        return 'High Risk'
df['NEW COLUMN'] = df.apply(bucketing, axis=1)

CodePudding user response：

I would use a np.where() to get you what you are looking for

data = {'Name' : ['John Smith', 'Jane Doe'],
        'NATIONALITY':  ['RU', 'NA'],
        'CTRY_OF_RESIDENCE': ['Russia', 'America']
        }

df = pd.DataFrame(data)
df['new col'] = np.where((df['NATIONALITY'] == 'RU') & (df['CTRY_OF_RESIDENCE'] == 'Russia'), 'High Risk', '')
df