Home > Back-end >  Lex/Yacc based C parser: why unterminated string literal is not diagnosed?
Lex/Yacc based C parser: why unterminated string literal is not diagnosed?

Time:12-03

I built C parser from Lex/Flex & YACC/Bison grammars (1, 2) as:

$ flex c.l && yacc -d c.y && gcc lex.yy.c y.tab.c -o c

and then tested on this C code:

char* s = "xxx;

which is expected to produce missing terminating " character (or syntax error) diagnostics. However, it doesn't:

$ ./c t1.c
char* s = xxx;

Why? How to fix it?

Note: The STRING_LITERAL is defined in lex specification as:

L?\"(\\.|[^\\"])*\"     { count(); return(STRING_LITERAL); }

Here we see the [^\\"] part, which represents the "except the double-quote ", backslash , or new-line character" (C11, 6.4.5 String literals, 1) and the \\. part, which (incorrectly?) represents the escape-sequence (C11, 6.4.4.4 Character constants, 1). -- end note

UPD: Fix: The STRING_LITERAL is defined in lex specification as:

L?\"(\\.|[^\\"\n])*\"   { count(); return(STRING_LITERAL); }

CodePudding user response:

The lexer you link has a rule:

.           { /* Add code to complain about unmatched characters */ }

so when it sees an unmatched ", it will silently ignore it. If you add code here to complain about the character, you'll see that.

If you want a syntax error, you could have this action just return *yytext;

Note that your STRING_LITERAL pattern will match strings that contain embedded newlines, so if you have a mismatched " in a larger program wity another string later, it will be recognized as a long string with embedded newlines. This will likely lead to poor error reporting, since the error would be reported after the bug string rather than where it starts, making it hard for a user to debug.

  • Related