When using the python AST parser module in combination with scripts containing multi line strings, these multi line strings are always reduced to single line quoted strings. Example:
import ast
script = "text='''Line1\nLine2'''"
code = ast.parse (script, mode='exec')
print (ast.unparse (code))
node = code.body[0].value
print (node.lineno, node.end_lineno)
The output is:
> text = 'Line1\nLine2'
> 1 2
So in spite of being a multi line string before parsing, the text is reduced to a single line quoted string when unparsed. This makes script transformation difficult, because the multi lines are getting lost when unparsing a transformed AST graph.
Is there a way to parse/unparse scripts with multi line strings correctly with AST ?
Thank you in advance.
CodePudding user response:
An examination of ast.unparse
's underlying source reveals that the writer for the visit_Constant
method, _write_constant
, will produce the string repr
unless the backslashing process is specifically avoided:
class _Unparse:
def _write_constant(self, value):
if isinstance(value, (float, complex)):
...
elif self._avoid_backslashes and isinstance(value, str):
self._write_str_avoiding_backslashes(value)
else:
self.write(repr(value))
By default, _avoid_backslashes
is set to False
, however, multiline string formatting can be properly performed by overriding visit_Constant
and specifically calling _write_str_avoiding_backslashes
if the string node is multiline:
import ast
class Unparser(ast._Unparser):
def visit_Constant(self, node):
if isinstance(node.value, str) and node.lineno < node.end_lineno:
super()._write_str_avoiding_backslashes(node.value)
return
return super().visit_Constant(node)
def _unparse(ast_node):
u = Unparser()
return u.visit(ast_node)
script = "text='''Line1\nLine2'''"
print(_unparse(ast.parse(script)))
Output:
text = """Line1
Line2"""