I have a df with a column that contains a paragraph text and I have created a list of keywords. I would like to compare the keywords to the column text and then return the word that matches. I provide an example below:
keywords = ['yellow', 'orange', 'purple', 'pink']
df = 'colours' : ['my favourite colour is purple but sometimes pink', 'I have a yellow dinosaur', 'all flowers are red']
I ran this code:
df['match_colours'] = df.apply(lambda x: True if any(word in x.colours for word in keywords) else False, axis =1)
That returned a column which returned True if there was a match and False if there isn't a match. I just need an additional column which will specify which words do match
Thank you!
CodePudding user response:
You can use a list comprehension to add the column.
df['colour_res'] = [[i for i in keywords if i in row] for row in df.colours]
colours ... colour_res
0 my favourite colour is purple but sometimes pink ... [purple, pink]
1 I have a yellow dinosaur ... [yellow]
2 all flowers are red ... []
[3 rows x 3 columns]
CodePudding user response:
def custom_func(x):
for i in keywords:
if i in x:
return i
return None
df.col1 = df.colours.apply(custom_func)
Hope this works