Home > Software design >  How return more than one match on a list of text?
How return more than one match on a list of text?

Time:12-14

I currently have a function that yields a term and the sentence it occurs in. At this point, the function is only retrieving the first match from the list of terms. I would like to be able to retrieve all matches instead of just the first.

For example, the list_of_matches = ["heart attack", "cardiovascular", "hypoxia"] and a sentence would be text_list = ["A heart attack is a result of cardiovascular...", "Chronic intermittent hypoxia is the..."]

The ideal output is:

['heart attack', 'a heart attack is a result of cardiovascular...'],
['cardiovascular', 'a heart attack is a result of cardiovascular...'],
['hypoxia', 'chronic intermittent hypoxia is the...']

# this is the current function
def find_word(list_of_matches, line):
    for words in list_of_matches:
        if any([words in line]):
            return words, line

# returns list of 'term, matched string'
key_vals = [list(find_word(list_of_matches, line.lower())) for line in text_list if 
find_word(list_of_matches, line.lower()) != None]

# output is currently 
['heart attack', 'a heart attack is a result of cardiovascular...'],
['hypoxia', 'chronic intermittent hypoxia is the...']

CodePudding user response:

You're going to want to use regex here.

import re

def find_all_matches(words_to_search, text):
    matches = []
    for word in words_to_search:
        matched_text = re.search(word, text).group()
        matches.append(matched_text)
    return [matches, text]

Please note that this will return a nested list for all the matches.

CodePudding user response:

The solution needs 2 steps:

  1. fix the function
  2. process the output

Given that your disired output follows the pattern

    output = [
      [word1, sentence1],
      [word2, sentence1],
      [word3, sentence2],
    ]
  1. Fix the function: you should change de return on 'for' loop to iterate over each word of list_of_matches, to get all words that matches and not only the first

. It should stay like this:

    def find_word(list_of_matches, line):
        answer = []
        for words in list_of_matches:
            if any([words in line]):
                answer.append([words, line])
        return answer

With the function above, the output will be:

    key_vals = [
      [
        ['heart attack', 'a heart attack is a result of cardiovascular...'],
        ['cardiovascular', 'a heart attack is a result of cardiovascular...']
      ],
      [
        ['hypoxia', 'chronic intermittent hypoxia is the...']
      ]
    ]

  1. Process the output: Now you need to get the var "key_vals" and process all the list of lists for each sentence processed with the following code:
    output = []
    for word_sentence_list in key_vals:
        for word_sentence in word_sentence_list:
            output.append(word_sentence)

and, finally, the output will be:

    output = [
     ['heart attack', 'a heart attack is a result of cardiovascular...'],
     ['cardiovascular', 'a heart attack is a result of cardiovascular...'],
     ['hypoxia', 'chronic intermittent hypoxia is the...']
    ]

  • Related