The dataframe below has two text columns, text with sentences and keyword that has a list of keywords using which I want to filter the text column
I'm trying to filter the text on the condition of keyword column. If any of the words in the keyword column exist in text column, we retain that row and if not we drop it.
The output dataframe should look like this.
I tried using str.contains() function in pandas which is incorrect as contains() function is looking for regex/pattern.
df['text'].str.contains(df['keyword'].str)
I got the below error
TypeError: first argument must be string or compiled pattern
CodePudding user response:
With builtin any
function (to check if any of the list of keywords occurs within a text):
df = df[df.apply(lambda x: any(k in x.text for k in x.keyword), axis=1)]