Home > Software engineering >  Making generated parser work in Java for ANTLR 4.8
Making generated parser work in Java for ANTLR 4.8

Time:12-06

I've been having trouble getting my generated parser to work in Java for ANTLR 4.8. There are other answers to this question, but it seems that ANTLR has changed things since 4.7 and all the other answers are before this change. My code is:

    String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
    CharStream input = CharStreams.fromString(formula);
    Antlr.LogicGrammerLexer lexer = new Antlr.LogicGrammerLexer(input);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    Antlr.LogicGrammerParser parser = new Antlr.LogicGrammerParser(tokens);
    ParseTree pt = new ParseTree(parser);

It appears to be reading in the formula correctly into the CharStream, but anything I try to do past that just isn't working at all. For example, if I try to print out the parse tree, nothing will be printed. The following line will print out nothing:

    System.out.println(lexer._input.getText(new Interval(0, 100)));

Any advice appreciated.

EDIT: added the grammar file:

grammar LogicGrammer;

logicalStmt: BOOL_EXPR | '('logicalStmt' '*LOGIC_SYMBOL' '*logicalStmt')';
BOOL_EXPR: '('IDENTIFIER' '*MATH_SYMBOL' '*IDENTIFIER')';
IDENTIFIER: CHAR ('.'CHAR*)*;
CHAR: 'a'..'z' | 'A'..'Z' | '1'..'9';
LOGIC_SYMBOL: '~' | '|' | '&';
MATH_SYMBOL: '<' | '≤' | '=' | '≥' | '>';

CodePudding user response:

The BOOL_EXPR shouldn't be a lexer rule. I suggest you do something like this instead:

grammar LogicGrammer;

parse
 : logicalStmt EOF
 ;

logicalStmt
 : logicalStmt LOGIC_SYMBOL logicalStmt
 | logicalStmt MATH_SYMBOL logicalStmt
 | '(' logicalStmt ')'
 | IDENTIFIER
 ;

IDENTIFIER
 : CHAR  ( '.'CHAR  )*
 ;

LOGIC_SYMBOL
 : [~|&]
 ;

MATH_SYMBOL
 : [<≤=≥>]
 ;

SPACE
 : [ \t\r\n] -> skip
 ;

fragment CHAR
 : [a-zA-Z1-9]
 ;

which can be tested by running the following code:

String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
LogicGrammerLexer lexer = new LogicGrammerLexer(CharStreams.fromString(formula));
LogicGrammerParser parser = new LogicGrammerParser(new CommonTokenStream(lexer));
ParseTree root = parser.parse();
System.out.println(root.toStringTree(parser));

CodePudding user response:

This line:

ParseTree pt = new ParseTree(parser);

is incorrect. You need to call the start rule method on your parser object to get your parse tree

Antlr.LogicGrammerParser parser = new Antlr.LogicGrammerParser(tokens);
ParseTree pt = parser.logicalStmt();

So far as printing out your input, generally fields starting with an _ (like _input) are not intended for external use. Though I suspect the failure may be that you don't have 100 characters in your input stream, so the Interval is invalid. (I haven't tried it to see the exact failure)

I you include your grammar, one of us could easily attempt to generate and compile and, perhaps, be more specific.


Using your grammar, this works for me:

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.misc.Interval;
import org.antlr.v4.runtime.tree.ParseTree;

public class Logic {
    public static void main(String... args) {
        String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
        CharStream input = CharStreams.fromString(formula);
        LogicGrammerLexer lexer = new LogicGrammerLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        LogicGrammerParser parser = new LogicGrammerParser(tokens);
        ParseTree pt = parser.logicalStmt();
        System.out.println(pt.toStringTree());
        System.out.println(input.getText(new Interval(1, 28)));
    }
}

output:

([] (fm.a < fm.b))
fm.a < fm.b) | (fm.a = fm.b)

BTW, a couple of minor suggestions for your grammar:

  • set up a rule to skip whitespace WS: [ \t\r\n] -> skip;
  • change BOOL_EXPR to a parser rule (since it's made up of a composition of tokens from other lexer rules:
grammar LogicGrammer
    ;

logicalStmt
    : boolExpr
    | '(' logicalStmt LOGIC_SYMBOL logicalStmt ')'
    ;
boolExpr:     '(' IDENTIFIER MATH_SYMBOL IDENTIFIER ')';
IDENTIFIER:   CHAR  ('.' CHAR*)*;
CHAR:         'a' ..'z' | 'A' ..'Z' | '1' ..'9';
LOGIC_SYMBOL: '~' | '|' | '&';
MATH_SYMBOL:  '<' | '≤' | '=' | '≥' | '>';
WS:           [ \t\r\n]  -> skip;
  • Related