Example, parent pandas df is as follow
Now I have another df, df_keywords
What I want to do is, check if any of the keyword from df_keywords['Keywords to Search'] is present is df['Description'] and create a new column as df['Keywords'], for example..
CodePudding user response:
One way to do it is to create a helper column on df
with the keyword matching Keywords to Search
in df_keywords
. Then, merge df
with df_keywords
on the keywords, as follows:
keywords = df_keywords['Keywords to Search'].dropna().unique()
df['keyword'] = df['Description'].map(lambda x: np.nan if (pd.isna(x) or len(w:=[y for y in keywords if y in str(x)]) == 0) else w[0])
Result:
print(df)
Description keyword
0 I have a dog dog
1 I have a cat cat
2 I want coffee coffee
3 I need sleep sleep
Then, merge the 2 dataframes with .merge()
matching the keywords:
df_out = df.merge(df_keywords, left_on='keyword', right_on='Keywords to Search')
Result:
print(df_out)
Description keyword Keywords to Search Keywords to Update
0 I have a dog dog dog dog related
1 I have a cat cat cat cat related
2 I want coffee coffee coffee coffee related
3 I need sleep sleep sleep sleep related
Finally, remove unwanted columns and rename column,
df_out = df_out.drop(['keyword', 'Keywords to Search'], axis=1).rename({'Keywords to Update': 'Keywords'}, axis=1)
Result:
print(df_out)
Description Keywords
0 I have a dog dog related
1 I have a cat cat related
2 I want coffee coffee related
3 I need sleep sleep related
CodePudding user response:
Here's how I've done it. This won't work if description contains more than 1 keyword, and may not be the fasted if you have a lot of words to check.
What I've done is create a blank 'Keywords to Search' column in df_parent
, then gone through each keyword, and checked if the keyword exists in in the 'Description' column using df_parent['Description'].str.contains(keyword)
. Then I've replaced any True values with the keyword, and false with nan and used this new series to fill in the Nan's in my new 'Keywords to Search' column in the parent dataframe. This then returns a filled in 'Keywords to Search' column in df_parent
with the keyword corresponding to the description of that row. Then I've merged df_parent
and df_keywords
on the 'Keywords to Search' column, and then dropped it to get the desired output.
import pandas as pd
import numpy as np
df_parent = pd.DataFrame({'Description':['I have a dog', 'I have a cat', 'I want coffee', 'I need sleep']})
df_keywords = pd.DataFrame({'Keywords to Search' : ['cat', 'sleep', 'dog', 'coffee'],
'Keywords to Update' : ['cat related', 'sleep related', 'dog related', 'coffee related']})
df_parent['Keywords to Search'] = np.nan
for keyword in df_keywords['Keywords to Search']:
df_parent['Keywords to Search'].fillna(df_parent['Description'].str.contains(keyword)
.replace({True:keyword, False:np.nan}), inplace=True)
output = df_parent.merge(df_keywords, on='Keywords to Search').drop('Keywords to Search', axis=1)
output:
Description Keywords to Update
0 I have a dog dog related
1 I have a cat cat related
2 I want coffee coffee related
3 I need sleep sleep related