Home > database >  Good way to test that lexical token are accurate
Good way to test that lexical token are accurate

Time:08-15

I have started writing a grammar by defining all the Lexical tokens. Let me just give a made-up basic example:

// TestLexer.g4
// Be able to lex x 4 1.2
lexer grammar TestLexer;
VAR: [a-z] ;
PLUS: ' ';
INT: [0-9] ;
DEC: [0-9]  '.' [0-9] ;

Now, I want to make sure that all my lexical tokens are correct and accurate. Of course the 'end-product' is to create an actual valid parser, but in the meantime, I want to make sure that all my tokens are OK (it is trivial in the below example, but in the actual grammar I have 100 tokens, some quite complex).

Is there a suggested way to validate that all my tokens are OK? As an example, I thought maybe the most basic example would be doing:

program: TOKEN* EOF;
TOKEN: PLUS|INT|VAR|DEC;

It becomes a bit tedious to add in every single token (especially fixing them as they change). But is the above the best way to do it, or are there other, better ways to do this (perhaps even a mode in the TestRig itself to do this?).


What worked (from Mike's answer):

$ antlr4 TestLexer.g4
$ javac TestLexer.java
$ grun TestLexer tokens -tokens
x 1 1.2
[@0,0:0='x',<VAR>,1:0]
[@1,1:1=' ',<' '>,1:1]
[@2,2:2='1',<INT>,1:2]
[@3,3:3=' ',<' '>,1:3]
[@4,4:6='1.2',<DEC>,1:4]
[@5,7:6='<EOF>',<EOF>,1:7]

CodePudding user response:

You can just use the command line tool with the -tokens option.

(This is the tool that the website recommends setting up as the grun alias.)

  • Related