Dear experienced friends, I am a newbie to Python Regex, and trying to figure out how to match the strings. However, I found a wired case that cannot be captured by Python Regex. Would you mind giving me some hints on it? Thank you in advance!
Suppose we have a string like this, and our target is finding all the numbers:
text_to_search = '''
321-555-4321
123.555.1234
800\555\1234 # this one is hard to caputre
'''
I tried different patterns to capture the number with backslashes.
# pattern 1
pattern = re.compile(r'[0-9]{3}(\|.|-)[0-9]{3}(\|.|-)[0-9]{4}')
# pattern 2
pattern = re.compile(r'\d{3}[\\\.-]\d{3}[\\\.-]\d{4}')
No matter which pattern I choose, it can only capture the first two number strings.
matches = pattern.finditer(text_to_search)
for i in matches: print(i.group(0))
>>> 321-555-4321
>>> 123.555.1234
I also tried to increase the number of backslashes to \\
, \\\
, and \\\\
. But none of them worked. May I ask why this happened, and how can I solve it? Thank you!
CodePudding user response:
The initial string text_to_search is had backslashes that are escaping the string. You will need to add a second backslash to complete the string. Try something like this. Or make the inital text_to_search a raw string below.
import re
text_to_search = '''
321-555-4321
123.555.1234
800\\555\\1234 # this one is hard to caputre
'''
pattern = re.compile(r'\d{3}[\\\.-]\d{3}[\\\.-]\d{4}')
matches = pattern.finditer(text_to_search)
for i in matches: print(i.group(0))
Raw String method:
import re
text_to_search = R'''
321-555-4321
123.555.1234
800\555\1234 # this one is hard to caputre
'''
pattern = re.compile(r'\d{3}[\\\.-]\d{3}[\\\.-]\d{4}')
matches = pattern.finditer(text_to_search)
for i in matches: print(i.group(0))