Let's say you have the following simple code:
import re
expr = re.escape(r'C ')
re.sub(expr, 'Java', 'Hello, C ')
The program will return the result 'Hello, Java'
.
If I change the replaced string to 'Hello, C 123'
, the program returns the result 'Hello, Java123'
.
I need the program to search for a complete match for a word, for this I add the quantifiers
r'\bC \b'
. New program code:
import re
expr = re.escape(r'\bC \b')
re.sub(expr, 'Java', 'Hello, C ')
As a result, I get the string 'Hello, C '
instead of 'Hello, Java'
.
How to fix this script?
CodePudding user response:
Word boundaries won't work as intended on a
character, which is a non word character. Recall that \b
matches a boundary between a word and non word character. Both space and
are non word characters, so \b
won't detect that boundary. Instead, you might find that the following works:
import re
expr = re.escape(r'(?<!\S)C (?!\S)')
output = re.sub(expr, 'Java', 'Hello, C ')
print(output) # Hello, C
The pattern used says to match:
(?<!\S) assert that what precedes is either whitespace OR the start of the string
C\ \ match C
(?!\S) assert that what follows is either whitespace OR the end of the string
CodePudding user response:
Beyond interaction between
and \b
already explained by TimBiegeleisen, note that if you want \b
to be word boundary, not literal \b
, you should not re.escape
it. Consider following example:
import re
text = 'Text with f.e.w words'
pat1 = re.escape(r'\bf.e.w\b')
pat2 = r'\b' re.escape(r'f.e.w') r'\b'
if re.search(pat1, text):
print('Found pat1 in text')
if re.search(pat2, text):
print('Found pat2 in text')
output
Found pat2 in text