Home > Blockchain >  Why Python re.escape() escapes "#" character?
Why Python re.escape() escapes "#" character?

Time:10-28

Reading the re.escape() documentation, it says

Changed in version 3.3: The '_' character is no longer escaped.

Changed in version 3.7: Only characters that can have special meaning in a regular expression are escaped. As a result, '!', '"', '%', "'", ',', '/', ':', ';', '<', '=', '>', '@', and "`" are no longer escaped.

Question is, why # character is still escaped?

CodePudding user response:

When using re.X / re.VERBOSE option, the # char becomes special (as does any literal whitespace).

Check the code snippet below:

import re
pattern = "# Something"
text = "Here is # Something"
print( re.search(pattern, text ) )
# => <re.Match object; span=(8, 19), match='# Something'>
print( re.search(pattern, text, re.X ) )
# => <re.Match object; span=(0, 0), match=''>

See the Python demo.

When using re.search(pattern, text, re.X ) there is no match because # Something is parsed as a comment, the # marks the single line comment start, all text after it till the line break is ignored in the pattern.

So, re.escape escapes #, then it is treated as a literal char when re.X / re.VERBOSE is used:

print( re.search(re.escape(pattern), text, re.X ) )
# => <re.Match object; span=(8, 19), match='# Something'>
  • Related