Home > other >  Find a match in a column from list of strings from another column
Find a match in a column from list of strings from another column

Time:11-04

I have two dataframes. I need to find a match and return results in another column using below criteria.

df1 = pd.DataFrame(
    {
        "Keywords": ["SYS", "SYS2", "SYS3"]
    }

df2 = pd.DataFrame(
    {
        "Lookup": ["TEST SYSTEM", "SYS", "DUMMY" , "THIS IS SYS3"]
    }

My expected end result is

df2 = pd.DataFrame(
    {
        "LookupResults": ["SYS", "THIS IS SYS3"]
    }

Basically i need to find those columns with full strings that match my keywords. Note i dont want TEST SYSTEM as my result. i.e no partial.

Have tried this so far. --Convert the keywords column to list

findwords = df['Keywords'].values

--Split the Lookup strings into a list

df2['words'] =  [set(words) for words in 
df2['Lookup'].str.strip().str.split()]

--Search using below

df2['match'] = df2.words.apply(lambda words: all(target_word in words for target_word in findwords))

I am not getting desired result . However if I do something like findwords = ['SYS'] i am getting desired result.

Clearly i am a novice and missing some basics. Any help is appreciated. Thanks

CodePudding user response:

# define the pattern from Keywords in df1
# \b : word boundary

pat='\\b('  '|'.join(df1['Keywords'].values)  ')\\b'
p
'\\b(SYS|SYS2|SYS3)\\b'
# extract pattern and filter using loc
df2.loc[df2['Lookup'].str.extract(pat)[0].notna()]


    Lookup
1   SYS
3   THIS IS SYS3
  • Related