Home > Back-end >  Check if one of strings in once column found in the other column
Check if one of strings in once column found in the other column

Time:10-06

Im trying to check if one of strings in column Name B is found in column Name A by creating new column Name Check:

Current Inputs:

df = pd.DataFrame({"Name A":{"0":"John","1":"Sara","2":"Adam","3":"Ahmed"},
                   "Name B":{"0":"John, Geroge","1":"Ahemed, Sara","2":"Adam, Nadia","3":"Sara, John"},
                   "Salary":{"0":100,"1":200,"2":300,"3":400}})

    Name A  Name B        Salary
0   John    John, Geroge  100
1   Sara    Ahemed, Sara  200
2   Adam    Adam, Nadia   300
3   Ahmed   Sara, John    400

Excepted Output :

    Name A  Name B        Salary  Name Check
0   John    John, Geroge  100     True
1   Sara    Ahemed, Sara  200     True
2   Adam    Adam, Nadia   300     True
3   Ahmed   Sara, John    400     False
4   Nadi    Sara, Nadia   500     True
5   George  Georg, Mo     600     True

What i have tried :

df['Name Check'] = df.apply(lambda x: x['Name B'] in x['Name A'] , axis=1)

But the output is all False, not sure how to convert column Name B to a list and loop through to check one by one if found in column Name A.

CodePudding user response:

Here is an approach using a regex with word boundaries:

import re
df.apply(lambda r: bool(re.search(r'\b%s\b' % r['Name A'], r['Name B'])), axis=1)

Explanation: this defines a regex per row of the form \bJohn\b, which ensures a full match is done

CodePudding user response:

If possible split by , with optionaly space use Series.str.split with DataFrame.isin and DataFrame.any:

df['Name Check'] = (df['Name B'].str.split(',\s*', expand=True)
                                .isin(df['Name A']).any(axis=1))

For test splitted substrings use:

f = lambda x: any(y in x['Name A'] or x['Name A'] in y for y in x['Name B'].split(', '))
df['Name Check1'] = df.apply(f, axis=1)
  • Related