Home > Enterprise >  Why Python Regular Expression cannot capture the backslash, even with double/triple backslashes?
Why Python Regular Expression cannot capture the backslash, even with double/triple backslashes?

Time:04-03

Dear experienced friends, I am a newbie to Python Regex, and trying to figure out how to match the strings. However, I found a wired case that cannot be captured by Python Regex. Would you mind giving me some hints on it? Thank you in advance!


Suppose we have a string like this, and our target is finding all the numbers:

text_to_search = '''
321-555-4321
123.555.1234
800\555\1234   # this one is hard to caputre
'''

I tried different patterns to capture the number with backslashes.

# pattern 1
pattern = re.compile(r'[0-9]{3}(\|.|-)[0-9]{3}(\|.|-)[0-9]{4}')

# pattern 2
pattern = re.compile(r'\d{3}[\\\.-]\d{3}[\\\.-]\d{4}')

No matter which pattern I choose, it can only capture the first two number strings.

matches = pattern.finditer(text_to_search)

for i in matches: print(i.group(0))

>>> 321-555-4321
>>> 123.555.1234

I also tried to increase the number of backslashes to \\, \\\, and \\\\. But none of them worked. May I ask why this happened, and how can I solve it? Thank you!

CodePudding user response:

The initial string text_to_search is had backslashes that are escaping the string. You will need to add a second backslash to complete the string. Try something like this. Or make the inital text_to_search a raw string below.

import re

text_to_search = '''
  321-555-4321
  123.555.1234
  800\\555\\1234   # this one is hard to caputre
'''


pattern = re.compile(r'\d{3}[\\\.-]\d{3}[\\\.-]\d{4}')

matches = pattern.finditer(text_to_search)

for i in matches: print(i.group(0))

Raw String method:

import re

text_to_search = R''' 
  321-555-4321
  123.555.1234
  800\555\1234   # this one is hard to caputre
'''


pattern = re.compile(r'\d{3}[\\\.-]\d{3}[\\\.-]\d{4}')

matches = pattern.finditer(text_to_search)

for i in matches: print(i.group(0))
  • Related