Home > Blockchain >  How to not double print when matching multiple words in a sentence
How to not double print when matching multiple words in a sentence

Time:09-08

Say we have the following sentence and words to match

sentences = ['There are three apples and oranges in the fridge.', 'I forgot the milk.']
wordsMatch = ['apples', 'bananas', 'fridge']

How to iterate through the sentences and print them if there is a match without double / triple printing it if it finds more than 1 matches?

For instance, the following code will print the first sentence twice because it finds apples and fridge:

matchedSentences = [sentence for sentence in sentences for word in wordsMatch if word in sentence]

# output: ['There are three apples and oranges in the fridge.', 'There are three apples and oranges in the fridge.']

What solution would be most appropriate?

CodePudding user response:

output = [
    sentence 
    for sentence in sentences 
    if any(word in sentence for word in wordsMatch)
]

The any function returns True as soon as one of the elements in the iterable provided is True. Therefore we can simply loop over the sentences and check that any of the words is in a sentence.

PS:

I am assuming that the solution should be case-sensitive. If not, you can adjust the word in sentence-check accordingly:

output = [
    sentence
    for sentence in sentences
    if any(word.lower() in sentence.lower() for word in wordsMatch)
]

PPS: (just for fun)

Here is how to do it in-place with the sentences list:

removed = 0
for i, sentence in enumerate(reversed(sentences), start=1):
    if not any(word in sentence for word in wordsMatch):
        del sentences[-i   removed]
        removed  = 1
  • Related