Home > database >  Why doesn't re.compile(rf"(\b{re.escape('g(x)')}\b) ",re.I) find the stri
Why doesn't re.compile(rf"(\b{re.escape('g(x)')}\b) ",re.I) find the stri

Time:06-21

I ran the following code in python 3.8

import re

a,b,c = 'g(x)g', '(x)g', 'g(x)'
a_re = re.compile(rf"(\b{re.escape(a)}\b) ",re.I)
b_re = re.compile(rf"(\b{re.escape(b)}\b) ",re.I)
c_re = re.compile(rf"(\b{re.escape(c)}\b) ",re.I)

a_re.findall('g(x)g')
b_re.findall('(x)g')
c_re.findall('g(x)')
c_re.findall(' g(x) ')

The result I want is below.

['g(x)g']
['(x)g']
['g(x)']
['g(x)']

But the actual result is below.

['g(x)g']
[]
[]
[]

The following conditions must be observed:

A combination of variables and f-string should be used.
\b must not be removed.

Because I want to know if there are certain characters in the sentence.

How can I get the results I want?


Regular characters have no problem using \b, but it won't work for words that start with '(' or end with ')'.

I was wondering if there is an alternative to \b that can be used in these words.

I must use the same function as \b because I want to make sure that the sentence contains a specific word.

CodePudding user response:

\b is the boundary between \w and \W characters (Docs). That is why your first one gives the result (since it starts and ends with characters) but none of the others.

To get the expected result, your patterns should look like these:

a_re = re.compile(rf"(\b{re.escape(a)}\b) ",re.I)  # No change
b_re = re.compile(rf"({re.escape(b)}\b) ",re.I)  # No '\b' in the beginning
c_re = re.compile(rf"(\b{re.escape(c)}) ",re.I)  # No '\b' in the end

CodePudding user response:

You can write your own \b by finding start, end, or separator and not capturing it

  • (^|[ .\"\']) start or boundary
  • ($|[ .\"\']) end or boundary
  • (?:) non-capture group
>>> a_re = re.compile(rf"(?:^|[ .\"\'])({re.escape(a)})(?:$|[ .\"\'])", re.I)
>>> b_re = re.compile(rf"(?:^|[ .\"\'])({re.escape(b)})(?:$|[ .\"\'])", re.I)
>>> c_re = re.compile(rf"(?:^|[ .\"\'])({re.escape(c)})(?:$|[ .\"\'])", re.I)
>>> a_re.findall('g(x)g')
['g(x)g']
>>> b_re.findall('(x)g')
['(x)g']
>>> c_re.findall('g(x)')
['g(x)']
>>> c_re.findall(' g(x) ')
['g(x)']
  • Related