Home > Software engineering >  How do you insert extra validation logic into an ANTLR4 parser rule?
How do you insert extra validation logic into an ANTLR4 parser rule?

Time:08-19

I have an ANTLR4 grammar that has a parser rule line as below:

| expression operator='=' expression    #AssignmentExpression

This rule is part of a large compound rule for defining an expression. However, the reality is that only a subset of actual expressions types are valid for the left hand side of an assignment, but due to left recursive issues, I cannot scope the parser rule down to those specific expression subsets. What I wish to do, is insert custom code into the generated parser when matching the rule, that then evaluates the actual most inner type within the expression on the left hand, to insure it is of one of the valid types. If it is not, ideally I would generate a custom parser error to be registered, something like Invalid expression on the left hand assignment. Root expression must be of type identifier or property reference.. I'm sure there is a way to do this with ANTLR4, but I have not been able to find the proper method.

I am creating a lexer/parser for a Language called Moo that is used in an object based mud environment. I noticed that the server parser (written using yacc/bison) takes a similar approach of allowing expression '=' expression, but then interrogates the left hand expression to insure it is of the correct subtype, otherwise generates a parser error. If however, this is not the correct way to do such a thing within ANTLR, I would love to be corrected and educated about the correct way in which to achieve this.

For anyone curious about further details, the language only allows a property reference or identifier on the left hand side, however those could be indexed, so a[1] = 1 is still valid. This is why I need to not only check the expression type of the left hand expression, but also determine its root expression type (in this case the identifier 'a').

CodePudding user response:

This seems a good example of a situation where many of us suggest NOT trying to shoehorn everything about your language into the grammar.

You have a grammar that correctly builds a parse tree of this only valid interpretation of the input stream. The parser has done its job for you.

Now, using a listener (maybe a visitor if you find it a better fit), you can identify this situation and report out exactly the error message you want.

You create your listener by inheriting from <gramarName>BaseListener and overriding the applicable enter* or exit* methods for whatever context node you're interested in (in your case enterAssignmentExpression()). You then use a tree walker to walk the parse tree you got back from the parser calling your listener as it goes.

 ParseTreeWalker.DEFAULT.walk(<yourListener>, <yourParseTree>);

In this case, maybe label you expressions:

| lhs=expression operator='=' hrs=expression    #AssignmentExpression

The in the enterAssignmentExpression() method override, examine the lhs expression, and if it’s the wrong type of expression, add your error message to your collection of errors. The grammar remains straightforward and you can be as specific as you want with your error message.

  • Related