Im trying to check if one of strings in column Name B
is found in column Name A
by creating new column Name Check
:
Current Inputs:
df = pd.DataFrame({"Name A":{"0":"John","1":"Sara","2":"Adam","3":"Ahmed"},
"Name B":{"0":"John, Geroge","1":"Ahemed, Sara","2":"Adam, Nadia","3":"Sara, John"},
"Salary":{"0":100,"1":200,"2":300,"3":400}})
Name A Name B Salary
0 John John, Geroge 100
1 Sara Ahemed, Sara 200
2 Adam Adam, Nadia 300
3 Ahmed Sara, John 400
Excepted Output :
Name A Name B Salary Name Check
0 John John, Geroge 100 True
1 Sara Ahemed, Sara 200 True
2 Adam Adam, Nadia 300 True
3 Ahmed Sara, John 400 False
4 Nadi Sara, Nadia 500 True
5 George Georg, Mo 600 True
What i have tried :
df['Name Check'] = df.apply(lambda x: x['Name B'] in x['Name A'] , axis=1)
But the output is all False, not sure how to convert column Name B
to a list and loop through to check one by one if found in column Name A
.
CodePudding user response:
Here is an approach using a regex with word boundaries:
import re
df.apply(lambda r: bool(re.search(r'\b%s\b' % r['Name A'], r['Name B'])), axis=1)
Explanation: this defines a regex per row of the form \bJohn\b
, which ensures a full match is done
CodePudding user response:
If possible split by ,
with optionaly space use Series.str.split
with DataFrame.isin
and DataFrame.any
:
df['Name Check'] = (df['Name B'].str.split(',\s*', expand=True)
.isin(df['Name A']).any(axis=1))
For test splitted substrings use:
f = lambda x: any(y in x['Name A'] or x['Name A'] in y for y in x['Name B'].split(', '))
df['Name Check1'] = df.apply(f, axis=1)