Home > Enterprise >  How can I access an element in my list, if it is joined by a special character?
How can I access an element in my list, if it is joined by a special character?

Time:03-09

I am writing a function which checks if a value is in a list, and if so, returns a comment like below:

iso_list = ['FR','UK', 'GER']

def bucketing(row):
    if row['NATIONALITY'] == 'RU' and row['party_other_list'] in iso_list:
        return 'Exempt EU national'
    elif row['NATIONALITY'] == 'RU' and row['party_other_list'] not in iso_list:
        return 'high risk nationality'   

The problem is, a few of the rows I want to check have a double value assigned. Eg in the final row below:

party_other_list
FR
UK
UK,RU

Now, since UK is technically in my list I want the last row to fall under my first condition, however it is part of a dual country here, 'UK,RU'.

How do I capture these rows which have dual components, one of which falls under my list?

CodePudding user response:

Split the string into a list, then use the any() function.

def bucketing(row):
    if row['NATIONALITY'] == 'RU':
        other_list = row['party_other_list'].split(',')
        if any(other in iso_list for other in other_list):
            return 'Exempt EU national'
        else:
            return 'high risk nationality'

CodePudding user response:

You could use str.contains to check if any row in "party_other_list" contains any of the country names in iso_list. Then use numpy.select to select values depending on which condition is satisfied, instead of applying a function to each row.

import numpy as np
cond = df['party_other_list'].str.contains('|'.join(iso_list))
RU = df['NATIONALITY'].eq('RU')
df['status'] = np.select([RU & cond, RU & ~cond], ['Exempt EU national', 'high risk nationality'], np.nan)

Output:

  party_other_list NATIONALITY              status
0               FR          RU  Exempt EU national
1               UK          RU  Exempt EU national
2            UK,RU          RU  Exempt EU national

Note that this is meant to replace bucketing entirely.

CodePudding user response:

You can use the isdisjoint method from the sets and the split method from the strings. I also improved the if/else block: the condition row['NATIONALITY'] == 'RU' is always computed once.

The following solution assumes that 2 countries in party_other_list are only separated with a ','.

iso_list = {'FR', 'UK', 'GER'}
def bucketing(row):
    if row['NATIONALITY'] == 'RU':
        countries = row['party_other_list'].split(',')  # 'UK,RU' -> ['UK', 'RU']
        # Be careful, I switched the conditions
        if iso_list.isdisjoint(countries):  # True if iso_list and countries are disjoint and False otherwise
            return 'high risk nationality'
        else:
            return 'Exempt EU national'
  • Related