I am writing a function to find if two columns satisfy a condition, and if so, I want to return a new column with a statement. I thought I could just do df.apply(function), but it does not seem to work!
def bucketing(df):
if df['NATIONALITY'] == 'RU' and df['CTRY_OF_RESIDENCE'] == 'Russia':
return 'High Risk'
merged.apply(bucketing, axis = 1)
This is my error:
TypeError: unsupported operand type(s) for |: 'str' and 'str'
My expected output would be a new column with the string 'High Risk' returned if the above condition is met.
Is there a more efficient way of doing this?
Thanks
CodePudding user response:
Here is an easier way:
import numpy as np
df['new col'] = np.where((df['NATIONALITY'] == 'RU') & (df['CTRY_OF_RESIDENCE'] == 'Russia'), 'High Risk', np.where((df['NATIONALITY'] == 'UK') & (df['CTRY_OF_RESIDENCE'] == 'Ukraine'), 'Medium Risk', ''))
CodePudding user response:
If you want to still utilize your code I think this would work but a sample DF would help to check
def bucketing(row):
if row['NATIONALITY'] == 'RU' & row['CTRY_OF_RESIDENCE'] == 'Russia':
return 'High Risk'
df['NEW COLUMN'] = df.apply(bucketing, axis=1)
CodePudding user response:
I would use a np.where() to get you what you are looking for
data = {'Name' : ['John Smith', 'Jane Doe'],
'NATIONALITY': ['RU', 'NA'],
'CTRY_OF_RESIDENCE': ['Russia', 'America']
}
df = pd.DataFrame(data)
df['new col'] = np.where((df['NATIONALITY'] == 'RU') & (df['CTRY_OF_RESIDENCE'] == 'Russia'), 'High Risk', '')
df