Home > other >  Return a name from the list, if found any of the matches from a test string
Return a name from the list, if found any of the matches from a test string

Time:01-21

The output should return the match from the file

I have thousands of records of the names in a file:

ex: ["jake","mike","Tyson","Sachin"]

test_string="i love mike and i live in Mexico, the city i love much...."

Expected output: Found, mike

I am using the code:

import re
search_list = ["jake","mike","Tyson","Sachin"]
long_string = 'i love MiKe and i live in Mexico, the city i love much....'
if re.compile('|'.join(search_list),re.IGNORECASE).search(long_string):
    print("found")
else:
    print("Not Found")

Output: found

I am looking for a solution where I want to print the exact name from the 'search_list', it should ignore the casing, And, time complexity should be less and looking to work for millions of records.

CodePudding user response:

If you have a reasonably small list of target search keywords, you could form a regex alternation based on that list and then use re.findall:

import re
search_list = ["Jake", "mike", "Tyson", "Sachin"]
regex = r'\b('   r'|'.join(search_list)   r')\b'
long_string = 'i love MiKe and i live in Mexico, the city i love much....'
matches = re.findall(regex, long_string, flags=re.I)
print(matches)  # ['MiKe']

CodePudding user response:

Well a simpler and more naive solution could be to use python dictionaries which have been proven to work for large amounts of data whilst having a small time complexity.

Considering the constraint that the search_list is always going to be smaller than the test_string, you could store the values of the search list in a dictionary and split the test string with a space and store it as an array.

Now run through the test_string array and update the dictionary values as and when you get the respective word. Since finding the key in a dictionary has O(1) time complexity, the overall time complexity should be O(n)[ iterating through the test_string array once].

You could then print the values for a frequency count>0.

CodePudding user response:

Names=["jake","mike","Tyson","Sachin"]
search_string="i love mike and i live in Mexico, the city i love much...."
final_search_string=search_string.split()
for i in Names:
    for j in final_search_string:
        if i==j:
            print("Found",i)
  •  Tags:  
  • Related