I am trying to write a parser for the IBM Assembler Language, Example below.
Comment lines start with a star* at the first character, however there are 2 problems
Beyond a set point in the line there can also be descriptive text, but there is no star* neccessary.
The descriptive can/does contain lexer tokens, such as ENTRY or INPUT.....
* TYPE. ARG DSECT NXENT DS F some comment text ENTRY NUMBER NMADR DS F some comment text INPUT NAME NAADR DS F some comment text NATYP DS F some comment text NAENT DS F some comment text ORG NATYP some comment text
In my lexer I have devised the following, which works absolutley fine:
fragment CommentLine: Star {getCharPositionInLine() == 1}? .*? Nl
;
fragment Star: '*';
fragment Nl: '\r'? '\n' ;
COMMENT_LINE
: CommentLine -> channel (COMMENT)
;
My question is how do I manage the line comments starting at a particular char position in the parser grammer? I.e. Parser -> NAME DS INT? LETTER ??????????
CodePudding user response:
Sending comments to a COMMENT
channel (or -> skip
ing them) is a technique used to avoid having to define all the places comments are valid in your parser rules.
(Old 360 Assembler programmer here)
Since there are not really ways to place arbitrarily positioned comments in Assembler source, you don't really need to deal with shunting them off to the side. Actually because of the way comments are handled in assembler source, there's just NOT a way to identify them in a Lexer rule.
Since it can be a parser rule, you could set up a rule like:
trailingComment: (ID | STRING | NUMBER)* EOL;
where ID
, STRING
, NUMBER
, etc. are just the tokens in your lexer (You'd need to include pretty much all of them... a good argument, for not getting down to tokens for MVC
, CLC
, CLI
, (all the op codes... the path to madness). And of course EOL is your rule to match end of line (probably '\r?\n'
)
You would then end each of your rules for parsing a line that can contain a trailing comment (pretty much all of them) with the trailingComment
rule.
CodePudding user response:
I was heading down the path described by Mike Lischke before making my post and I had previously stumbled across the technique you mentioed as well, but thought i would canvas for other opinions. If I am honest I prefer your approach, as I would then get the advantage of the grammer visitor and listener events. I want the grammer to have some level of intelligence, but MikeLs technique of chunking up the line puts the intelligence up into the application, unless I am misunderstanding?