Does placement of tokens in the Lexer matter?-CodePudding

Is there any difference in either semantics or performance in where tokens are included in the `lexer file? For example:

EQUAL              :        '='      // Equal, also var:=val which is unsupported
NAMED_ARGUMENT     :        ':=';    // often used when calling custom Macro/Function

Vs.

NAMED_ARGUMENT     :        ':=';    // often used when calling custom Macro/Function
EQUAL              :        '='      // Equal, also var:=val which is unsupported

CodePudding user response：

In this example, the order won’t matter. If the Lexer finds :=, then it will generate a NAMED_EQUAL token (because it is a longer sequence of characters than =).

The Lexer will prefer the rule that matches the longest sequence of input characters.

The only time order matters is if multiple Lexer rules match the same length sequence of characters, and then the Lexer will generate a token for the first Lexer rule in the grammar (so, for example, be sure to put keywords before something like an ID rule, as it’s quite likely that your keyword will also match the ID rule, but by occurring before ID, the keyword will be selected.