Home > Mobile >  Difference between Parser.tokens and Lexer.tokens?
Difference between Parser.tokens and Lexer.tokens?

Time:08-12

Normally when I export or grun a grammar to a target language it gives me two .tokens files. For example in the following:

lexer grammar TestLexer;

NUM  : [0-9] ;
OTHER : ABC;
fragment NEWER : [xyz] ;
ABC : [abc] ;

I get a token for each non-fragment, and I get two identical files:

# Parser.tokens
NUM=1
OTHER=2
ABC=3
# Lexer.tokens
NUM=1
OTHER=2
ABC=3

Are these files always the same? I tried defining a token in the parser but since I've defined it as parser grammar it doesn't allow that, so I would assume these two files would always be the same, correct?

CodePudding user response:

Grammars are always processed as individual lexer and parser grammars. If a combined grammar is used then it is temporarily split into two grammars and processed individually. Each processing step produces a tokens file (the list of found lexer tokens). The tokens file is the link between lexers and parsers. When you set a tokenVocab value actually the tokens file is used. That also means you don't need a lexer grammar, if you have a tokens file.

I'm not sure about the parser.tokens file. It might be useful for grammar imports.

And then you can specify a tocenVocab for lexer grammars too, which allows you to explicitly assign number values to tokens, which can come in handy if you have to check for token ranges (e.g. all keywords) in platform code. I cannot check this currently, but it might be that using this feature leads to token files with different content.

  • Related