Home > Software design >  I'm trying to generate the parse tree for Antlr4 Python3.g4 grammar file, to parse python3 code
I'm trying to generate the parse tree for Antlr4 Python3.g4 grammar file, to parse python3 code


I'm using ANTLR4 and trying to generate a parse tree for a python file I have. I used the grammar file python3.g4 from the ANTLR4 documentation. I have the antlr4-python3-runtime installed, and I have ran this command:

antlr4 -Dlanguage=Python3 Python3.g4

This generated my parser and lexer files.

In Python3Lexer.py, I had errors for:

from typing.io import TextIO

so I changed it to:

from typing import TextIO

I also created this file called pythonparser.py, which is in the same folder as the parser and lexer files, to call onto the parser:

import sys
from antlr4 import *
from Python3Lexer import Python3Lexer
from Python3Parser import Python3Parser

def main(argv):
    input_stream = FileStream(argv[1])
    lexer = Python3Lexer(input_stream)
    stream = CommonTokenStream(lexer)
    parser = Python3Parser(stream)
    tree = parser.single_input()

if __name__ == '__main__':

I have also made a test.py file, which is in the same folder as the antlr grammars, with:

print("hello world")

I tried to run the grammar on this file to parse it, using the command:

python3 pythonparser.py test.py

Im not sure what to do as it doesn't work for me.

I receive this error message:

Traceback (most recent call last):
  File "/Users/Fari/Developer/PRJ/project/antlr/pythonparser.py", line 3, in <module>
    from Python3Lexer import Python3Lexer
  File "/Users/Fari/Developer/PRJ/project/antlr/Python3Lexer.py", line 19, in <module>
    LanguageParser = getattr(importlib.import_module('{}Parser'.format(module_path)), '{}Parser'.format(language_name))
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/Users/Fari/Developer/PRJ/project/antlr/Python3Parser.py", line 446, in <module>
    class Python3Parser ( Parser ):
  File "/Users/Fari/Developer/PRJ/project/antlr/Python3Parser.py", line 450, in Python3Parser
    atn = ATNDeserializer().deserialize(serializedATN())
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/antlr4/atn/ATNDeserializer.py", line 60, in deserialize
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/antlr4/atn/ATNDeserializer.py", line 90, in reset
    temp = [ adjust(c) for c in data ]
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/antlr4/atn/ATNDeserializer.py", line 90, in <listcomp>
    temp = [ adjust(c) for c in data ]
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/antlr4/atn/ATNDeserializer.py", line 88, in adjust
    v = ord(c)
TypeError: ord() expected string of length 1, but int found

I'm not sure where I'm going wrong.

CodePudding user response:

There are a lot of Python grammars. The ones you need are these:

After you've downloaded both these grammars, you need to preprocess them by running the file transformGrammar.py in the same folder as where the 2 grammar files are in.

Now download these 2 classes into the same folder:

When that is all done, generate the lexer and parser Python classes:

java -jar antlr-4.11.1-complete.jar *.g4 -Dlanguage=Python3

And if you now run the file:

from antlr4 import *
from Python3Lexer import Python3Lexer
from Python3Parser import Python3Parser

def main():
    input_stream = InputStream('print("hello world")\n')
    lexer = Python3Lexer(input_stream)
    stream = CommonTokenStream(lexer)
    parser = Python3Parser(stream)
    tree = parser.single_input()

if __name__ == '__main__':

the following output will be printed:

(single_input (simple_stmts (simple_stmt (expr_stmt (testlist_star_expr (test (or_test (and_test (not_test (comparison (expr (xor_expr (and_expr (shift_expr (arith_expr (term (factor (power (atom_expr (atom (name print)) (trailer ( (arglist (argument (test (or_test (and_test (not_test (comparison (expr (xor_expr (and_expr (shift_expr (arith_expr (term (factor (power (atom_expr (atom "hello world"))))))))))))))))) ))))))))))))))))))) \n))

Note that I did not change anything else (no typing.io to typing was needed). I used:

  • Python 3.10.9
  • ANTLR 4.11.1


When I stick the following in a file:

#!/usr/bin/env bash
wget https://raw.githubusercontent.com/antlr/grammars-v4/master/python/python3/Python3Lexer.g4
wget https://raw.githubusercontent.com/antlr/grammars-v4/master/python/python3/Python3Parser.g4
wget https://raw.githubusercontent.com/antlr/grammars-v4/master/python/python3/Python3/transformGrammar.py
wget https://raw.githubusercontent.com/antlr/grammars-v4/master/python/python3/Python3/Python3LexerBase.py 
wget https://raw.githubusercontent.com/antlr/grammars-v4/master/python/python3/Python3/Python3ParserBase.py
wget https://www.antlr.org/download/antlr-4.11.1-complete.jar

python3 transformGrammar.py

pip install antlr4-python3-runtime

java -jar antlr-4.11.1-complete.jar *.g4 -Dlanguage=Python3

cat << EOF > main.py
from antlr4 import *
from Python3Lexer import Python3Lexer
from Python3Parser import Python3Parser

def main():
    input_stream = InputStream('print("hello world")\n')
    lexer = Python3Lexer(input_stream)
    stream = CommonTokenStream(lexer)
    parser = Python3Parser(stream)
    tree = parser.single_input()

if __name__ == '__main__':

python3 --version

python3 main.py

and run this file, I get the following output:


antlr-4.11.1-complete.jar              100%[============================================================================>]   3,38M  9,33MB/s    in 0,4s

2023-01-31 10:51:47 (9,33 MB/s) - ‘antlr-4.11.1-complete.jar’ saved [3547867/3547867]

Altering Python3Lexer.g4
Writing ...
Altering Python3Parser.g4
Writing ...
Requirement already satisfied: antlr4-python3-runtime in /opt/homebrew/lib/python3.10/site-packages (4.11.1)
Python 3.10.9
(single_input (simple_stmts (simple_stmt (expr_stmt (testlist_star_expr (test (or_test (and_test (not_test (comparison (expr (xor_expr (and_expr (shift_expr (arith_expr (term (factor (power (atom_expr (atom (name print)) (trailer ( (arglist (argument (test (or_test (and_test (not_test (comparison (expr (xor_expr (and_expr (shift_expr (arith_expr (term (factor (power (atom_expr (atom "hello world"))))))))))))))))) ))))))))))))))))))) \n))
  • Related