I know that this 'type of question' is a bit frowned on on StackOverflow, but I think it hopefully has a good intent and can be answered in a meaningfully way to help with coding in ANTLR.
I usually see -- and also have adopted -- the standard formatting using by most g4
grammars, which looks something like this:
id
: something
| something-else
...
;
This works fine and is nicely readable for highly nested items and I get why it's used. However, a lot of times I'll scroll through hundreds of lines of code to understand lexer rules that are much like this:
MINUS
: '-'
;
TIMES
: '*'
;
DIV
: '/'
;
GT
: '>'
;
LT
: '<'
;
EQ
: '='
;
POINT
: '.'
;
POW
: '^'
;
In a way, I find it quite hard to read and to keep track of, as I'm often scrolling back and forth just to see what related tokens mean. Is there any disadvantage of using formatting such as the following for lexing tokens where there is almost zero complexity or alternation used?
MINUS : '-';
TIMES : '*';
DIV : '/';
GT : '>';
LT : '<';
EQ : '=';
POINT : '.';
POW : '^';
CodePudding user response:
Formatting is indeed a matter of taste (I for one consider colon and semicolon as block delimiters and hence put them like you would place parentheses in source code). However, grammar rules can differ greatly, both in size and in complexity. So, there's no one-fits-all formatting approach and it's totally fine to use multiple lines for large/complex rules and single line for short rules (or blocks).
I have considered these aspects in my ANTLR4 grammar formatter which comes with a large list of options to configure how you would like to see your grammar, like break-before-parentheses, hanging-semicolon and min-empty-lines (to name just a few). This formatter is part of the ANTLR4 extension for VS Code.