Home > Blockchain >  How to search for a keyword in different pandas dataframe and update or create a new column with mat
How to search for a keyword in different pandas dataframe and update or create a new column with mat

Time:10-01

Example, parent pandas df is as follow

enter image description here

Now I have another df, df_keywords

enter image description here

What I want to do is, check if any of the keyword from df_keywords['Keywords to Search'] is present is df['Description'] and create a new column as df['Keywords'], for example..

enter image description here

CodePudding user response:

One way to do it is to create a helper column on df with the keyword matching Keywords to Search in df_keywords. Then, merge df with df_keywords on the keywords, as follows:

keywords = df_keywords['Keywords to Search'].dropna().unique()

df['keyword'] = df['Description'].map(lambda x: np.nan if (pd.isna(x) or len(w:=[y for y in keywords if y in str(x)]) == 0) else w[0])

Result:

print(df)

     Description keyword
0   I have a dog     dog
1   I have a cat     cat
2  I want coffee  coffee
3   I need sleep   sleep

Then, merge the 2 dataframes with .merge() matching the keywords:

df_out = df.merge(df_keywords, left_on='keyword', right_on='Keywords to Search')

Result:

print(df_out)

     Description keyword Keywords to Search Keywords to Update
0   I have a dog     dog                dog        dog related
1   I have a cat     cat                cat        cat related
2  I want coffee  coffee             coffee     coffee related
3   I need sleep   sleep              sleep      sleep related

Finally, remove unwanted columns and rename column,

df_out = df_out.drop(['keyword', 'Keywords to Search'], axis=1).rename({'Keywords to Update': 'Keywords'}, axis=1)

Result:

print(df_out)

     Description        Keywords
0   I have a dog     dog related
1   I have a cat     cat related
2  I want coffee  coffee related
3   I need sleep   sleep related

CodePudding user response:

Here's how I've done it. This won't work if description contains more than 1 keyword, and may not be the fasted if you have a lot of words to check.

What I've done is create a blank 'Keywords to Search' column in df_parent, then gone through each keyword, and checked if the keyword exists in in the 'Description' column using df_parent['Description'].str.contains(keyword). Then I've replaced any True values with the keyword, and false with nan and used this new series to fill in the Nan's in my new 'Keywords to Search' column in the parent dataframe. This then returns a filled in 'Keywords to Search' column in df_parent with the keyword corresponding to the description of that row. Then I've merged df_parent and df_keywords on the 'Keywords to Search' column, and then dropped it to get the desired output.

import pandas as pd
import numpy as np

df_parent = pd.DataFrame({'Description':['I have a dog', 'I have a cat', 'I want coffee', 'I need sleep']})

df_keywords = pd.DataFrame({'Keywords to Search' : ['cat', 'sleep', 'dog', 'coffee'],
                            'Keywords to Update' : ['cat related', 'sleep related', 'dog related', 'coffee related']})

df_parent['Keywords to Search'] = np.nan
for keyword in df_keywords['Keywords to Search']:
    df_parent['Keywords to Search'].fillna(df_parent['Description'].str.contains(keyword)
                                        .replace({True:keyword, False:np.nan}), inplace=True)

output = df_parent.merge(df_keywords, on='Keywords to Search').drop('Keywords to Search', axis=1)

output:

     Description Keywords to Update
0   I have a dog        dog related
1   I have a cat        cat related
2  I want coffee     coffee related
3   I need sleep      sleep related
  • Related