Home > database >  Using python ast parser to process multi line strings
Using python ast parser to process multi line strings

Time:06-13

When using the python AST parser module in combination with scripts containing multi line strings, these multi line strings are always reduced to single line quoted strings. Example:

import ast

script = "text='''Line1\nLine2'''"

code = ast.parse (script, mode='exec')
print (ast.unparse (code))

node = code.body[0].value
print (node.lineno, node.end_lineno)

The output is:

> text = 'Line1\nLine2'
> 1 2

So in spite of being a multi line string before parsing, the text is reduced to a single line quoted string when unparsed. This makes script transformation difficult, because the multi lines are getting lost when unparsing a transformed AST graph.

Is there a way to parse/unparse scripts with multi line strings correctly with AST ?

Thank you in advance.

CodePudding user response:

An examination of ast.unparse's underlying source reveals that the writer for the visit_Constant method, _write_constant, will produce the string repr unless the backslashing process is specifically avoided:

class _Unparse:
   def _write_constant(self, value):
      if isinstance(value, (float, complex)):
          ...
      elif self._avoid_backslashes and isinstance(value, str):
          self._write_str_avoiding_backslashes(value)
      else:
          self.write(repr(value))

By default, _avoid_backslashes is set to False, however, multiline string formatting can be properly performed by overriding visit_Constant and specifically calling _write_str_avoiding_backslashes if the string node is multiline:

import ast
class Unparser(ast._Unparser):
   def visit_Constant(self, node):
      if isinstance(node.value, str) and node.lineno < node.end_lineno:
         super()._write_str_avoiding_backslashes(node.value)
         return
      return super().visit_Constant(node)

def _unparse(ast_node):
   u = Unparser()
   return u.visit(ast_node)

script = "text='''Line1\nLine2'''"
print(_unparse(ast.parse(script)))

Output:

text = """Line1
Line2"""
  • Related