Home > OS >  How to return keywords in a list from multiple sentences?
How to return keywords in a list from multiple sentences?

Time:04-04

I have a list of keywords which I wish to match in a list of sentences. If found within that sentence than return the found keyword in a list.

What I have tried:

sentence = df['List of Content']
list_of_words = ['keyword1','keyword2', 'keyword3']

This below works if I choose only one row:

[word for word in list_of_words if word in sentence[0]

and outputs

output: ['keyword1', 'keyword3']

The desirable output for all the rows, is a list of keywords that match in the sentence. Something like this:

matching_keywords = [['keyword1', 'keyword3'],['keyword2, 'keyword3'],['keyword1', 'keyword2']..]

However, when I run the for loop in the entire list it just outputs an empty list []


I have also tried a nested for loop:

kwords = []
for row in MCC:
    for x in list_of_words:
        if x in row:
            kwords.append(x)

It either gives me an empty bracket list again [] or it just creates a long list of the keywords repeating themselves.

What is the mistake am I making? Anyone can try to help me with the logic/solution.

CodePudding user response:

You could extend your initial approach by doing the following.

[[word for word in list_of_words if word in row] for row in sentence]

Explanation: This amounts to nested list comprehension. For each row, we want a list of keywords that appear in that row. With list comprehension, this should be written as

[<list of keywords in row> for row in sentence]

On the other hand, if you have a specific row that you're looking at (for instance, row = sentence[0]), then as you state in your question the list of keywords that appear in this row can be obtained with [word for word in list_of_words if word in row]. Putting this together leads to the result I wrote above, namely

[[word for word in list_of_words if word in row] for row in sentence]

CodePudding user response:

Because you have pandas you can use apply like below:

df['List of Content'].apply(lambda x : [i for i in x.split() if i in list_of_words]).tolist()
  • Related