I ran the following code in python 3.8
import re
a,b,c = 'g(x)g', '(x)g', 'g(x)'
a_re = re.compile(rf"(\b{re.escape(a)}\b) ",re.I)
b_re = re.compile(rf"(\b{re.escape(b)}\b) ",re.I)
c_re = re.compile(rf"(\b{re.escape(c)}\b) ",re.I)
a_re.findall('g(x)g')
b_re.findall('(x)g')
c_re.findall('g(x)')
c_re.findall(' g(x) ')
The result I want is below.
['g(x)g']
['(x)g']
['g(x)']
['g(x)']
But the actual result is below.
['g(x)g']
[]
[]
[]
The following conditions must be observed:
A combination of variables and f-string should be used.
\b must not be removed.
Because I want to know if there are certain characters in the sentence.
How can I get the results I want?
CodePudding user response:
\b
is the boundary between \w
and \W
characters (Docs). That is why your first one gives the result (since it starts and ends with characters) but none of the others.
To get the expected result, your patterns should look like these:
a_re = re.compile(rf"(\b{re.escape(a)}\b) ",re.I) # No change
b_re = re.compile(rf"({re.escape(b)}\b) ",re.I) # No '\b' in the beginning
c_re = re.compile(rf"(\b{re.escape(c)}) ",re.I) # No '\b' in the end
CodePudding user response:
You can write your own \b
by finding start, end, or separator and not capturing it
(^|[ .\"\'])
start or boundary($|[ .\"\'])
end or boundary(?:)
non-capture group
>>> a_re = re.compile(rf"(?:^|[ .\"\'])({re.escape(a)})(?:$|[ .\"\'])", re.I)
>>> b_re = re.compile(rf"(?:^|[ .\"\'])({re.escape(b)})(?:$|[ .\"\'])", re.I)
>>> c_re = re.compile(rf"(?:^|[ .\"\'])({re.escape(c)})(?:$|[ .\"\'])", re.I)
>>> a_re.findall('g(x)g')
['g(x)g']
>>> b_re.findall('(x)g')
['(x)g']
>>> c_re.findall('g(x)')
['g(x)']
>>> c_re.findall(' g(x) ')
['g(x)']