Home > Blockchain >  Using Python's re.sub to change all instances of escaped three digits to an UTF-8 character
Using Python's re.sub to change all instances of escaped three digits to an UTF-8 character

Time:04-09

I have the following string:

string = r"string\032with\032backslash\032\092\032and\010new\035line"

What I want to do is change all escaped triples of digits (which are meant to be read decimally) into their utf-8 form using chr().

What I tried to do was

re.sub('(\\[0-9]{3})', chr('\1'), string)

as re.sub allows users to use matched groups in replacement. But this does not work. What would be the correct way to do this?

EDIT:

string="string" chr(32) 'with' chr(32) 'backslash' chr(32) chr(92) chr(32) 'and' chr(10) 'new' chr(35) 'line'

returns (correctly)

string with backslash\ and
new#line

CodePudding user response:

You made 2 mistakes:

  1. Your pattern needs to be a raw string as well (otherwise the \\ will be a string containing a single \, which as magic properties inside a regex.)
  2. If you want to make any changes to the replacement (here: remove the \ and convert the number into an integer and then into a character) you need to use a function.
>>> re.sub(r'(\\[0-9]{3})', lambda match: chr(int(match.group(0)[1:])), string)
'string with backslash \\ and\nnew#line'
  • Related