How can I access an element in my list, if it is joined by a special character?-CodePudding

I am writing a function which checks if a value is in a list, and if so, returns a comment like below:

iso_list = ['FR','UK', 'GER']

def bucketing(row):
    if row['NATIONALITY'] == 'RU' and row['party_other_list'] in iso_list:
        return 'Exempt EU national'
    elif row['NATIONALITY'] == 'RU' and row['party_other_list'] not in iso_list:
        return 'high risk nationality'

The problem is, a few of the rows I want to check have a double value assigned. Eg in the final row below:

party_other_list
FR
UK
UK,RU

Now, since UK is technically in my list I want the last row to fall under my first condition, however it is part of a dual country here, 'UK,RU'.

How do I capture these rows which have dual components, one of which falls under my list?

CodePudding user response：

Split the string into a list, then use the any() function.

def bucketing(row):
    if row['NATIONALITY'] == 'RU':
        other_list = row['party_other_list'].split(',')
        if any(other in iso_list for other in other_list):
            return 'Exempt EU national'
        else:
            return 'high risk nationality'

CodePudding user response：

You could use str.contains to check if any row in "party_other_list" contains any of the country names in iso_list. Then use numpy.select to select values depending on which condition is satisfied, instead of applying a function to each row.

import numpy as np
cond = df['party_other_list'].str.contains('|'.join(iso_list))
RU = df['NATIONALITY'].eq('RU')
df['status'] = np.select([RU & cond, RU & ~cond], ['Exempt EU national', 'high risk nationality'], np.nan)

Output:

  party_other_list NATIONALITY              status
0               FR          RU  Exempt EU national
1               UK          RU  Exempt EU national
2            UK,RU          RU  Exempt EU national

Note that this is meant to replace bucketing entirely.

CodePudding user response：

You can use the isdisjoint method from the sets and the split method from the strings. I also improved the if/else block: the condition row['NATIONALITY'] == 'RU' is always computed once.

The following solution assumes that 2 countries in party_other_list are only separated with a ','.

iso_list = {'FR', 'UK', 'GER'}
def bucketing(row):
    if row['NATIONALITY'] == 'RU':
        countries = row['party_other_list'].split(',')  # 'UK,RU' -> ['UK', 'RU']
        # Be careful, I switched the conditions
        if iso_list.isdisjoint(countries):  # True if iso_list and countries are disjoint and False otherwise
            return 'high risk nationality'
        else:
            return 'Exempt EU national'