Replacing a word in text using Python and the re module-CodePudding

Let's say you have the following simple code:

import re

expr = re.escape(r'C  ')
re.sub(expr, 'Java', 'Hello, C  ')

The program will return the result 'Hello, Java'.

If I change the replaced string to 'Hello, C 123', the program returns the result 'Hello, Java123'.

I need the program to search for a complete match for a word, for this I add the quantifiers r'\bC \b'. New program code:

import re

expr = re.escape(r'\bC  \b')
re.sub(expr, 'Java', 'Hello, C  ')

As a result, I get the string 'Hello, C ' instead of 'Hello, Java'. How to fix this script?

CodePudding user response：

Word boundaries won't work as intended on a character, which is a non word character. Recall that \b matches a boundary between a word and non word character. Both space and are non word characters, so \b won't detect that boundary. Instead, you might find that the following works:

import re
expr = re.escape(r'(?<!\S)C  (?!\S)')
output = re.sub(expr, 'Java', 'Hello, C  ')
print(output)  # Hello, C

The pattern used says to match:

(?<!\S)  assert that what precedes is either whitespace OR the start of the string
C\ \     match C  
(?!\S)   assert that what follows is either whitespace OR the end of the string

CodePudding user response：

Beyond interaction between and \b already explained by TimBiegeleisen, note that if you want \b to be word boundary, not literal \b, you should not re.escape it. Consider following example:

import re
text = 'Text with f.e.w words'
pat1 = re.escape(r'\bf.e.w\b')
pat2 = r'\b'   re.escape(r'f.e.w')   r'\b'
if re.search(pat1, text):
    print('Found pat1 in text')
if re.search(pat2, text):
    print('Found pat2 in text')

output

Found pat2 in text