Home > database >  Java expression parsing with ANTLR
Java expression parsing with ANTLR

Time:11-04

I'm writing a toolkit in Java that uses Java expression parsing. I thought I'd try using ANTLR since

  1. It seems to be used ubiquitously for this sort of thing
  2. There don't seem to be a lot of open source alternatives
  3. I actually tried to write my own generalized parser a while back and gave up. That stuff's hard.

I have to say, after what I feel is a lot of reading and trying different things (more than I had expected to spend, anyway), ANTLR seems incredibly difficult to use. The API is very unintuitive--I'm never quite sure whether I'm calling it right.

Although ANTLR tutorials and examples abound, I haven't had luck finding any examples that involve parsing Java "expressions" -- everyone else seems to want to parse whole java files.

I started off calling it like this:

        Java8Lexer lexer = new Java8Lexer(CharStreams.fromString(text));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        Java8Parser parser = new Java8Parser(tokens);
        ParseTree result = parser.expression();

but that wouldn't parse the whole expression. E.g. with text "a.b" it would return a result that only consisted of the "a" part, just quitting after the first thing it could parse.

Fine. So I changed to:

        String input = "return "   text   ";";
        Java8Lexer lexer = new Java8Lexer(CharStreams.fromString(input));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        Java8Parser parser = new Java8Parser(tokens);
        ParseTree result = parser.returnStatement();
        result = result.getChild(1);

thinking this would force it to parse the entire expression, then I could just extract the part I cared about. That worked for name expressions like "a.b", but if I try to parse a method expression like "a.b.c(d)" it gives an error:

line 1:12 mismatched input '(' expecting '.'

Interestingly, a(), a.b(), and a.b.c parse fine, but a.b.c() also dies with the same error.

Is there an ANTLR expert here who might have an idea what I'm doing wrong?

Separately, it bothers me quite a bit that the error above is printed to stderr, but I can't find it in the result object anywhere. I'd like to be able to present that error message (vague as it is) to the user that entered the expression--they may not be looking at a console, and even if they are, there's no context there. Is there a way to find that information in the result I get back?

Any help is greatly appreciated.

CodePudding user response:

For a rule like expression, ANTLR will stop parsing once it recognizes an expression.

You can force it to continue by adding an `EOF to you start rule.

(You don’t want to modify the actual `expressions rule, but you can add a rule like this:

expressionStart: expressions EOF;

Then you can use:

ParseTree result = parser.expressionStart();

This will force ANTLR to continue parsing you’re input until it reaches the end of you input.

  • Related