I am facing some strange python regex issue. The following two strings are supposedly to be exactly the same. But somehow they are not matching.
import re
print(" \\\"")
print(" " chr(92) chr(34) "")
print(re.search(" \\\"", " " chr(92) chr(34) ""))
However, the following does match
import re
print("\\\"")
print("" chr(92) chr(34) "")
print(re.search("\\\"", "" chr(92) chr(34) ""))
Any thought on what is going on here?
CodePudding user response:
Issue is the backslash character has special meaning to a string in python. You can use a Python raw string created by prefixing a string literal with 'r' or 'R' where python raw string treats backslash (\)
as a literal character.
import re
print(" \\\"")
print(" " chr(92) chr(34) "")
print(re.search(r" \\\"", " " chr(92) chr(34) ""))
Output:
\"
\"
<re.Match object; span=(0, 3), match=' \\"'>
In second example print(re.search("\\\"", "" chr(92) chr(34) ""))
outputs:
<re.Match object; span=(1, 2), match='"'>
where only the double quote is matched.
Need to escape the backslash or use a raw string. If use single-quotes around the regexp then the double-quote does not need to be escaped.
s = "" chr(92) chr(34) ""
print(re.search("\\\\\"", s))
print(re.search(r"\\\"", s))
print(re.search(r'\\"', s))
Output:
<re.Match object; span=(0, 2), match='\\"'>
<re.Match object; span=(0, 2), match='\\"'>
<re.Match object; span=(0, 2), match='\\"'>
For further details of the raw string and backslash, see answers to this question.