Home > database >  How to parse a single escape character between escape characters using ANTLR
How to parse a single escape character between escape characters using ANTLR

Time:02-13

I have a string like RANDOM = "SOMEGIBBERISH ("DOG CAT-DOG","DOG CAT-DOG")". For quoted string literals I use:

StringLiteralSQ : UnterminatedStringLiteralSQ '\'' ;
UnterminatedStringLiteralSQ : '\'' (~['\r\n] | '\\' (. | EOF))* ;   
StringLiteralDQ : UnterminatedStringLiteralDQ '"' ;
UnterminatedStringLiteralDQ : '"' (~[\r\n] | '\\' (. | EOF))* ;

This parses the above mentioned String. I need to identify them words as comma separated tokens like this DOG CAT-DOG. for this I use something like

options : name EQUALS value
  | OPTIONS L_PAREN (name EQUALS value) (COMMA (name EQUALS value)* R_PAREN
  ;

However, when I make the string of this format RANDOM = "SOMEGIBBERISH ("DOG CAT-DOG"DOG CAT-DOG")", it fails with an out-of-memory error.

I wanted to parse the strings that I have been parsing before and also parse this kind of string ("DOG CAT-DOG"DOG CAT-DOG") and consider it a single token maybe. How can I do that?

CodePudding user response:

Your question is a bit confusing, so I'm not sure I understand what you are after. You ask for handling escaped characters, but then you don't show any input which uses escapes.

However, I think you are making things way too complicated. Look in other grammars to see how they define string tokens, including escape handling. Here's a typical example:

fragment SINGLE_QUOTE: '\'';
fragment DOUBLE_QUOTE: '"';

DOUBLE_QUOTED_TEXT: (
        DOUBLE_QUOTE ('\\'? .)*? DOUBLE_QUOTE
    ) 
;

SINGLE_QUOTED_TEXT: (
        SINGLE_QUOTE ('\\'? .)*? SINGLE_QUOTE
    ) 
;

CodePudding user response:

If you want to tokenize "SOMEGIBBERISH ("DOG CAT-DOG"DOG CAT-DOG")" as a single token (so treat (...) as an escaped part of the string), you can try something like this:

STRING
 : '"' STRING_ATOM* '"'
 ;

fragment STRING_ATOM
 : ~[\\"\r\n()]
 | '\\' ~[\r\n]
 | '(' ~[)]* ')'
 ;
  • Related