I have two dataframes. I need to find a match and return results in another column using below criteria.
df1 = pd.DataFrame(
{
"Keywords": ["SYS", "SYS2", "SYS3"]
}
df2 = pd.DataFrame(
{
"Lookup": ["TEST SYSTEM", "SYS", "DUMMY" , "THIS IS SYS3"]
}
My expected end result is
df2 = pd.DataFrame(
{
"LookupResults": ["SYS", "THIS IS SYS3"]
}
Basically i need to find those columns with full strings that match my keywords. Note i dont want TEST SYSTEM as my result. i.e no partial.
Have tried this so far. --Convert the keywords column to list
findwords = df['Keywords'].values
--Split the Lookup strings into a list
df2['words'] = [set(words) for words in
df2['Lookup'].str.strip().str.split()]
--Search using below
df2['match'] = df2.words.apply(lambda words: all(target_word in words for target_word in findwords))
I am not getting desired result . However if I do something like findwords = ['SYS'] i am getting desired result.
Clearly i am a novice and missing some basics. Any help is appreciated. Thanks
CodePudding user response:
# define the pattern from Keywords in df1
# \b : word boundary
pat='\\b(' '|'.join(df1['Keywords'].values) ')\\b'
p
'\\b(SYS|SYS2|SYS3)\\b'
# extract pattern and filter using loc
df2.loc[df2['Lookup'].str.extract(pat)[0].notna()]
Lookup
1 SYS
3 THIS IS SYS3