Home > OS >  Antrl4: Matching '\r\n' and '\n' as one symbol
Antrl4: Matching '\r\n' and '\n' as one symbol

Time:10-26

I'm trying to limit number of empty lines between blocks to .

EOL:
   '\r'?
   '\n'
   '\r'? 
   '\n'?
   SPACE*

Is there a way to match '\n' and '\r\n' as a one symbol, like '\R' (linebreak) in Perl 5. Or ignore '\r' completely?

CodePudding user response:

All Lexer rules result in a single "symbol" (in ANTLR they are referred to as tokens).

The typical way of match EOL in Antlr would be something like:

EOL: '\r?\n';

this would match an optional carriage return followed by a line feed.

This would match \r\n or \n

It's pretty customary to put whitespace into a separate rule with a -> skip directive or a -> channel(HIDDNE) directive.

if you're trying to coalesce end of line whitespace and multiple blank lines, try:

EOL:   '\r'? '\n' (' '* '\r'? '\n')*;

this will generate a token for trailing whitespace and any subsequent empty lines and their line terminators.

lexer grammar sample:

lexer grammar eol
    ;

ALPHA: [A-Za-z] ;
EOL:   '\r'? '\n' (' '* '\r'? '\n')*;

sample input file:

ANB



G
KL








ZZ
➜ grun eol tokens -tokens < eol.txt
[@0,0:2='ANB',<ALPHA>,1:0]
[@1,3:6='\n\n\n\n',<EOL>,1:3]
[@2,7:7='G',<ALPHA>,5:0]
[@3,8:8='\n',<EOL>,5:1]
[@4,9:10='KL',<ALPHA>,6:0]
[@5,11:19='\n\n\n\n\n\n\n\n\n',<EOL>,6:2]
[@6,20:21='ZZ',<ALPHA>,15:0]
[@7,22:21='<EOF>',<EOF>,15:2]
  • Related