Home > Software engineering >  Match keywords to paragraph dataframe
Match keywords to paragraph dataframe

Time:10-14

I have a df with a column that contains a paragraph text and I have created a list of keywords. I would like to compare the keywords to the column text and then return the word that matches. I provide an example below:

keywords = ['yellow', 'orange', 'purple', 'pink']

df = 'colours' : ['my favourite colour is purple but sometimes pink', 'I have a yellow dinosaur', 'all flowers are red']

I ran this code:

df['match_colours'] = df.apply(lambda x: True if any(word in x.colours for word in keywords) else False, axis =1)

That returned a column which returned True if there was a match and False if there isn't a match. I just need an additional column which will specify which words do match

Thank you!

CodePudding user response:

You can use a list comprehension to add the column.

df['colour_res'] = [[i for i in keywords if i in row] for row in df.colours]

                                            colours  ...      colour_res
0  my favourite colour is purple but sometimes pink  ...  [purple, pink]
1                          I have a yellow dinosaur  ...        [yellow]
2                               all flowers are red  ...              []

[3 rows x 3 columns]

CodePudding user response:

def custom_func(x):
  for i in keywords:
    if i in x:
      return i
  return None
df.col1 = df.colours.apply(custom_func)

Hope this works

  • Related