As stated in title, I have a dataframe (let's call this df1) that is similar to this:
Col A | Desc |
---|---|
00001 | Dog |
00002 | dogs |
00003 | cat |
00004 | cats |
00005 | hooman |
I have a list of keywords I want to search, in a second dataframe, df2:
Keyword |
---|
dog |
cats |
bird |
How do I identify all records in df1 that has at least one keyword match from df2, and the final outcome is a dataframe (new or add to df1) that lists all the columns in df1 the matched keyword? On top of that... ideally case insensitive, and the keyword list entry "dog" would help me also find "dogs" from df1?
Sample Expected Output:
Col A | Desc | Matched Keyword |
---|---|---|
00001 | Dog | dog |
00002 | dogs | dog |
00003 | cat | |
00004 | cats | cats |
00005 | hooman |
I've searched for some time in this site, here are a few other ones I have tried to follow but none of them actually worked. I always get nothing matched.
search dataframe for a keyword in any column and get the rows value matching between two DataFrames using pandas in python searching if anyone of word is present in the another column of a dataframe or in another data frame using python How to search for a keyword in different pandas dataframe and update or create a new column with matching keyword in parent DF
Any help would be great, thanks!
CodePudding user response:
import pandas as pd
from typing import List
df1 = pd.DataFrame({'col1': ["0001","0002","0003","0004","0005"], 'values':["dogs","cat","Dog","cats","hooman"]})
df2 = pd.DataFrame({"Keywords": ['dog','cat','bird']})
def find_string_in_substring(value:str, list_of_strings: List[str]):
for sub_value in list_of_strings:
if value.lower() in sub_value.lower() or sub_value.lower() in value.lower():
return sub_value
return False
df1["keyword_from_df2"] = df1["values"].apply(lambda x : find_string_in_substring(x,df2['Keywords'].tolist()))
df1
The logic is pretty straight forward, hope it is good enough, if not I will try to help better!