The output should return the match from the file
I have thousands of records of the names in a file:
ex: ["jake","mike","Tyson","Sachin"]
test_string="i love mike and i live in Mexico, the city i love much...."
Expected output: Found, mike
I am using the code:
import re
search_list = ["jake","mike","Tyson","Sachin"]
long_string = 'i love MiKe and i live in Mexico, the city i love much....'
if re.compile('|'.join(search_list),re.IGNORECASE).search(long_string):
print("found")
else:
print("Not Found")
Output: found
I am looking for a solution where I want to print the exact name from the 'search_list', it should ignore the casing, And, time complexity should be less and looking to work for millions of records.
CodePudding user response:
If you have a reasonably small list of target search keywords, you could form a regex alternation based on that list and then use re.findall
:
import re
search_list = ["Jake", "mike", "Tyson", "Sachin"]
regex = r'\b(' r'|'.join(search_list) r')\b'
long_string = 'i love MiKe and i live in Mexico, the city i love much....'
matches = re.findall(regex, long_string, flags=re.I)
print(matches) # ['MiKe']
CodePudding user response:
Well a simpler and more naive solution could be to use python dictionaries which have been proven to work for large amounts of data whilst having a small time complexity.
Considering the constraint that the search_list is always going to be smaller than the test_string, you could store the values of the search list in a dictionary and split the test string with a space and store it as an array.
Now run through the test_string array and update the dictionary values as and when you get the respective word. Since finding the key in a dictionary has O(1) time complexity, the overall time complexity should be O(n)[ iterating through the test_string array once].
You could then print the values for a frequency count>0.
CodePudding user response:
Names=["jake","mike","Tyson","Sachin"]
search_string="i love mike and i live in Mexico, the city i love much...."
final_search_string=search_string.split()
for i in Names:
for j in final_search_string:
if i==j:
print("Found",i)