Home > Software engineering >  Flex can't handle EOF in Action, or how to?
Flex can't handle EOF in Action, or how to?

Time:09-19

env

  • OS: MacOS Monterey 12.6 with apple M1 Pro
  • version: flex 2.6.4 Apple(flex-34)

Problem

%{
#include <stdio.h>
int yywrap(void) {
    return 1;
}
%}

ID  [a-z][a-zA-Z0-9]*
DIGIT [0-9]

%%

{DIGIT}     {printf("num");}

{ID}    printf("id: %s\n", yytext);

"/*"    {
    int c;

    for ( ; ; ) {
        while ( (c = input()) != '*' && c != EOF )
            printf("%x\n", c);

        if ( c == '*' ) {
            while ( (c = input()) == '*' )
                printf("%x\n", c);
            if ( c == '/' )
                break;    /* found the end */
            }

        if ( c == EOF ) {
            printf( "EOF in comment" );
            break;
        }
    }
}
%%

int main(int argc, char **argv) {
      argv, --argc;
    if (argc > 0) {
        yyin = fopen(argv[0], "r");
    } else {
        yyin = stdin;
    }
    yylex();
    return 0;
}

I try this code, hope to handle EOF in Action by myself. But it fail to handle this file, like:

/* **jfiowejfiowe
 *
 *

Yes, I just want to test unclosed comment in the end of file.

lex test.lex
gcc lex.yy.c -o lextest
./lextest comment.c

and it output

20
2a
66
69
6f
77
65
6a
66
69
6f
77
65
a
20
20
0
0
0
...

with endless 0.

I also try to rewrite yywrap()

int yywrap() {
    return 0;
}

But it output like

20
2a
66
69
6f
77
65
6a
66
69
6f
77
65
a
20
20
[1]    57558 segmentation fault  ./lextest comment.c

has a segmentation fault error.

Also I try yyterminate like

#define yyterminate() return YY_NULL

int yywrap(void) {
    yyterminate();
}

still has a segmentation fault error.

CodePudding user response:

Up to version 2.6.0, the input() function returned EOF on end of input. For reasons which are not at all clear to me, this was changed for version 2.6.1 (released in March, 2016); since then, input() returns 0 on end of input, making it pretty well impossible to distinguish between end of input and a NUL character. The Flex documentation has not been updated to reflect this change.

A very clumsy workaround is available. In yywrap, you could set a global flag (or put a flag in the yyextra data, for a reentrant scanner), and then you could check the flag after input() returns 0. This is not a fully general solution, but it should work in your case.

Alternatively, you could install an older Flex release (2.5.39 and 2.6.0 are both available at the Github release page. Or you could download the source for version 2.6.4 and patch the file flex.skl at line 1859, changing it from return 0 to return EOF.

Whatever the reason for the change, it's certainly a bug, either in the behaviour or in the documentation, and it has been reported several times: most recently, apparently, by you, and previously as issue 212 in 2017 and issues 444 and 448 in 2020. That last one contains references to bug reports in other places (debian, for example).

On the whole, I feel that the use of input() should be avoided if at all possible. It has its uses, but in most cases, you'll find it faster and more convenient to use start conditions. In the particular case of multiline comments, even that's unnecessary; they can be recognised with a pair of patterns. (One for correct comments, and another one to flag an error for unterminated comments.)

  • Related