Reading the re.escape()
documentation, it says
Changed in version 3.3: The '_' character is no longer escaped.
Changed in version 3.7: Only characters that can have special meaning in a regular expression are escaped. As a result, '!', '"', '%', "'", ',', '/', ':', ';', '<', '=', '>', '@', and "`" are no longer escaped.
Question is, why #
character is still escaped?
CodePudding user response:
When using re.X
/ re.VERBOSE
option, the #
char becomes special (as does any literal whitespace).
Check the code snippet below:
import re
pattern = "# Something"
text = "Here is # Something"
print( re.search(pattern, text ) )
# => <re.Match object; span=(8, 19), match='# Something'>
print( re.search(pattern, text, re.X ) )
# => <re.Match object; span=(0, 0), match=''>
See the Python demo.
When using re.search(pattern, text, re.X )
there is no match because # Something
is parsed as a comment, the #
marks the single line comment start, all text after it till the line break is ignored in the pattern.
So, re.escape
escapes #
, then it is treated as a literal char when re.X
/ re.VERBOSE
is used:
print( re.search(re.escape(pattern), text, re.X ) )
# => <re.Match object; span=(8, 19), match='# Something'>