I have a string like RANDOM = "SOMEGIBBERISH ("DOG CAT-DOG","DOG CAT-DOG")"
. For quoted string literals I use:
StringLiteralSQ : UnterminatedStringLiteralSQ '\'' ;
UnterminatedStringLiteralSQ : '\'' (~['\r\n] | '\\' (. | EOF))* ;
StringLiteralDQ : UnterminatedStringLiteralDQ '"' ;
UnterminatedStringLiteralDQ : '"' (~[\r\n] | '\\' (. | EOF))* ;
This parses the above mentioned String. I need to identify them words as comma separated tokens like this DOG CAT-DOG. for this I use something like
options : name EQUALS value
| OPTIONS L_PAREN (name EQUALS value) (COMMA (name EQUALS value)* R_PAREN
;
However, when I make the string of this format RANDOM = "SOMEGIBBERISH ("DOG CAT-DOG"DOG CAT-DOG")"
, it fails with an out-of-memory error.
I wanted to parse the strings that I have been parsing before and also parse this kind of string ("DOG CAT-DOG"DOG CAT-DOG")
and consider it a single token maybe. How can I do that?
CodePudding user response:
Your question is a bit confusing, so I'm not sure I understand what you are after. You ask for handling escaped characters, but then you don't show any input which uses escapes.
However, I think you are making things way too complicated. Look in other grammars to see how they define string tokens, including escape handling. Here's a typical example:
fragment SINGLE_QUOTE: '\'';
fragment DOUBLE_QUOTE: '"';
DOUBLE_QUOTED_TEXT: (
DOUBLE_QUOTE ('\\'? .)*? DOUBLE_QUOTE
)
;
SINGLE_QUOTED_TEXT: (
SINGLE_QUOTE ('\\'? .)*? SINGLE_QUOTE
)
;
CodePudding user response:
If you want to tokenize "SOMEGIBBERISH ("DOG CAT-DOG"DOG CAT-DOG")"
as a single token (so treat (...)
as an escaped part of the string), you can try something like this:
STRING
: '"' STRING_ATOM* '"'
;
fragment STRING_ATOM
: ~[\\"\r\n()]
| '\\' ~[\r\n]
| '(' ~[)]* ')'
;