I am using regex in Python to search for multiple patterns in a string. A simplified example would be as follows:
import regex
s = "vrhvydhvkzejjvksdlstringvhehvehvurejlcslvdk" #string to look into
p = ['(?P<string>string)', '(?P<longtext>longtext)'] #patterns to search for
r = regex.compile('(?b)(' " | ".join(p) '){s<=3}') #regex, allowing for 3 mismatches, bestmatch to be reported
r.search(s) #searching for patterns p in string s
<regex.Match object; span=(18, 25), match='stringv', fuzzy_counts=(1, 0, 0)> #search results
My expected result would be:
<regex.Match object; span=(18, 24), match='string', fuzzy_counts=(0, 0, 0)>
Why do regex reports a fuzzy match stringv
with 1 mismatch instead of reporting the exact match string
? And how do I need to modify my code to get to my expected results?
I am with Python-3.7.3 and regex 2.5.115
CodePudding user response:
The '(?e)(' " | ".join(p) '){s<=3}'
results in a (?e)((?P<string>string) | (?P<longtext>longtext)){s<=3}
regex, see the spaces around |
. Since v
is substituted for a space when matching the (?P<string>string)
regex part, you get stringv
as a match.
You need
r = regex.compile('(?b)(' "|".join(p) '){s<=3}') #regex, allowing for 3 mismatches, bestmatch to be reported
See the Python demo:
import regex
s = "vrhvydhvkzejjvksdlstringvhehvehvurejlcslvdk" #string to look into
p = ['(?P<string>string)', '(?P<longtext>longtext)'] #patterns to search for
rx = '(?e)(' "|".join(p) '){s<=3}'
r = regex.compile(rx) #regex, allowing for 3 mismatches, bestmatch to be reported
print( r.search(s) )
# => <regex.Match object; span=(18, 24), match='string'>