Home > OS >  Antlr4 picks up wrong tokens and rules
Antlr4 picks up wrong tokens and rules

Time:01-08

I have something that goes alongside:

method_declaration : protection? expression identifier LEFT_PARENTHESES (method_argument (COMMA method_argument)*)? RIGHT_PARENTHESES method_block;

expression
    : ...
    | ...
    | identifier
    | kind
    ;

identifier : IDENTIFIER ;
kind : ... | ... | VOID_KIND; // void for example there are more

IDENTIFIER : (LETTER | '_') (LETTER | DIGIT | '_')*;
VOID_KIND : 'void';

fragment LETTER : [a-zA-Z];
fragment DIGIT : [0-9];

*The other rules on the method_declaration are not relavent for this question

What happens is that when I input something such as void Start() { } and look at the ParseTree, it seems to think void is an identifier and not a kind, and treats it as such.

I tried changing the order in which kind and identifier are written in the .g4 file... but it doesn't quite seem to make any difference... why does this happen and how can I fix it?

CodePudding user response:

The order in which parser rules are defined makes no difference in ANTLR. The order in which token rules are defined does though. Specifically, when multiple token rules match on the current input and produce a token of the same length, ANTLR will apply the one that is defined first in the grammar.

So in your case, you'll want to move VOID_KIND (and any other lexer rules you may have) before IDENTIFIER. So pretty much what you already tried except with the lexer rules instead of the parser rules.

PS: I'm somewhat surprised that ANTLR doesn't produce a warning about VOID_KIND being unmatchable in this case. I'm pretty sure other lexer generators would produce such a warning in cases like this.

  • Related