I am writing a function which checks if a value is in a list, and if so, returns a comment like below:
iso_list = ['FR','UK', 'GER']
def bucketing(row):
if row['NATIONALITY'] == 'RU' and row['party_other_list'] in iso_list:
return 'Exempt EU national'
elif row['NATIONALITY'] == 'RU' and row['party_other_list'] not in iso_list:
return 'high risk nationality'
The problem is, a few of the rows I want to check have a double value assigned. Eg in the final row below:
party_other_list |
---|
FR |
UK |
UK,RU |
Now, since UK is technically in my list I want the last row to fall under my first condition, however it is part of a dual country here, 'UK,RU'.
How do I capture these rows which have dual components, one of which falls under my list?
CodePudding user response:
Split the string into a list, then use the any()
function.
def bucketing(row):
if row['NATIONALITY'] == 'RU':
other_list = row['party_other_list'].split(',')
if any(other in iso_list for other in other_list):
return 'Exempt EU national'
else:
return 'high risk nationality'
CodePudding user response:
You could use str.contains
to check if any row in "party_other_list" contains any of the country names in iso_list
. Then use numpy.select
to select values depending on which condition is satisfied, instead of applying a function to each row.
import numpy as np
cond = df['party_other_list'].str.contains('|'.join(iso_list))
RU = df['NATIONALITY'].eq('RU')
df['status'] = np.select([RU & cond, RU & ~cond], ['Exempt EU national', 'high risk nationality'], np.nan)
Output:
party_other_list NATIONALITY status
0 FR RU Exempt EU national
1 UK RU Exempt EU national
2 UK,RU RU Exempt EU national
Note that this is meant to replace bucketing
entirely.
CodePudding user response:
You can use the isdisjoint method from the sets and the split method from the strings. I also improved the if/else block: the condition row['NATIONALITY'] == 'RU' is always computed once.
The following solution assumes that 2 countries in party_other_list are only separated with a ','.
iso_list = {'FR', 'UK', 'GER'}
def bucketing(row):
if row['NATIONALITY'] == 'RU':
countries = row['party_other_list'].split(',') # 'UK,RU' -> ['UK', 'RU']
# Be careful, I switched the conditions
if iso_list.isdisjoint(countries): # True if iso_list and countries are disjoint and False otherwise
return 'high risk nationality'
else:
return 'Exempt EU national'