Home > Enterprise >  Replacing a word in text using Python and the re module
Replacing a word in text using Python and the re module

Time:12-02

Let's say you have the following simple code:

import re

expr = re.escape(r'C  ')
re.sub(expr, 'Java', 'Hello, C  ')

The program will return the result 'Hello, Java'.

If I change the replaced string to 'Hello, C 123', the program returns the result 'Hello, Java123'.

I need the program to search for a complete match for a word, for this I add the quantifiers r'\bC \b'. New program code:

import re

expr = re.escape(r'\bC  \b')
re.sub(expr, 'Java', 'Hello, C  ')

As a result, I get the string 'Hello, C ' instead of 'Hello, Java'. How to fix this script?

CodePudding user response:

Word boundaries won't work as intended on a character, which is a non word character. Recall that \b matches a boundary between a word and non word character. Both space and are non word characters, so \b won't detect that boundary. Instead, you might find that the following works:

import re
expr = re.escape(r'(?<!\S)C  (?!\S)')
output = re.sub(expr, 'Java', 'Hello, C  ')
print(output)  # Hello, C  

The pattern used says to match:

(?<!\S)  assert that what precedes is either whitespace OR the start of the string
C\ \     match C  
(?!\S)   assert that what follows is either whitespace OR the end of the string

CodePudding user response:

Beyond interaction between and \b already explained by TimBiegeleisen, note that if you want \b to be word boundary, not literal \b, you should not re.escape it. Consider following example:

import re
text = 'Text with f.e.w words'
pat1 = re.escape(r'\bf.e.w\b')
pat2 = r'\b'   re.escape(r'f.e.w')   r'\b'
if re.search(pat1, text):
    print('Found pat1 in text')
if re.search(pat2, text):
    print('Found pat2 in text')

output

Found pat2 in text
  • Related