Home > Back-end >  strange python regex: not able to find match
strange python regex: not able to find match

Time:10-02

I am facing some strange python regex issue. The following two strings are supposedly to be exactly the same. But somehow they are not matching.

import re
print(" \\\"")
print(" " chr(92) chr(34) "")
print(re.search(" \\\"", " " chr(92) chr(34) ""))

However, the following does match

import re
print("\\\"")
print("" chr(92) chr(34) "")
print(re.search("\\\"", "" chr(92) chr(34) ""))

Any thought on what is going on here?

CodePudding user response:

Issue is the backslash character has special meaning to a string in python. You can use a Python raw string created by prefixing a string literal with 'r' or 'R' where python raw string treats backslash (\) as a literal character.

import re
print(" \\\"")
print(" " chr(92) chr(34) "")
print(re.search(r" \\\"", " " chr(92) chr(34) ""))

Output:

 \"
 \"
<re.Match object; span=(0, 3), match=' \\"'>

In second example print(re.search("\\\"", "" chr(92) chr(34) "")) outputs: <re.Match object; span=(1, 2), match='"'> where only the double quote is matched.

Need to escape the backslash or use a raw string. If use single-quotes around the regexp then the double-quote does not need to be escaped.

s = ""   chr(92)   chr(34)   ""
print(re.search("\\\\\"", s))
print(re.search(r"\\\"", s))
print(re.search(r'\\"', s))

Output:

<re.Match object; span=(0, 2), match='\\"'>
<re.Match object; span=(0, 2), match='\\"'>
<re.Match object; span=(0, 2), match='\\"'>

For further details of the raw string and backslash, see answers to this question.

  • Related