Home > Mobile >  pandas DF with 1 or more columns to search for keywords, keywords to search for in a second DF; case
pandas DF with 1 or more columns to search for keywords, keywords to search for in a second DF; case

Time:07-09

As stated in title, I have a dataframe (let's call this df1) that is similar to this:

Col A Desc
00001 Dog
00002 dogs
00003 cat
00004 cats
00005 hooman

I have a list of keywords I want to search, in a second dataframe, df2:

Keyword
dog
cats
bird

How do I identify all records in df1 that has at least one keyword match from df2, and the final outcome is a dataframe (new or add to df1) that lists all the columns in df1 the matched keyword? On top of that... ideally case insensitive, and the keyword list entry "dog" would help me also find "dogs" from df1?

Sample Expected Output:

Col A Desc Matched Keyword
00001 Dog dog
00002 dogs dog
00003 cat
00004 cats cats
00005 hooman

I've searched for some time in this site, here are a few other ones I have tried to follow but none of them actually worked. I always get nothing matched.

search dataframe for a keyword in any column and get the rows value matching between two DataFrames using pandas in python searching if anyone of word is present in the another column of a dataframe or in another data frame using python How to search for a keyword in different pandas dataframe and update or create a new column with matching keyword in parent DF

Any help would be great, thanks!

CodePudding user response:

    import pandas as pd
from typing import List

df1 = pd.DataFrame({'col1': ["0001","0002","0003","0004","0005"], 'values':["dogs","cat","Dog","cats","hooman"]})
df2 = pd.DataFrame({"Keywords": ['dog','cat','bird']})

def find_string_in_substring(value:str, list_of_strings: List[str]):
    for sub_value in list_of_strings:
        if value.lower() in sub_value.lower() or sub_value.lower() in value.lower():
            return sub_value
    return False

df1["keyword_from_df2"] = df1["values"].apply(lambda x : find_string_in_substring(x,df2['Keywords'].tolist()))
df1

The logic is pretty straight forward, hope it is good enough, if not I will try to help better!

  • Related