Home > Enterprise >  main expects a filename as the first argument ... but I can alternatively provide main the file via
main expects a filename as the first argument ... but I can alternatively provide main the file via

Time:08-03

I have a C program with this main() function:

int main(int argc, char *argv[])
{ 
    FILE *f = fopen(argv[1], "r");
    ...
}

Notice that it expects, when executing the program, a filename be provided as the first argument, e.g.,

main test.dat

The program works fine when I run it that way.

Interestingly, the program also works fine when I run it this way:

cat test.dat | main

That is not providing main() with a filename. It is streaming the content of test.dat to main(). Right? So how does it work?

Further elaboration: The main() function is the main in a Bison parser. I show the main() function below. As I mentioned, the parser works fine whether I invoke it this way:

main test.dat

or this way:

cat test.dat | main

Here is the parser's main() function:

int main(int argc, char *argv[])
{ 
    yyin = fopen(argv[1], "r");
    yyparse();
    fclose(yyin);
    return 0;
}

CodePudding user response:

The fundamental problem is that you don't verify that fopen worked. Every call to fopen() should be followed by a check that the return value was not NULL. Otherwise, you will never notice that a user misspelled a filename, for example.

Normally, trying to use NULL FILE* arguments to stdio functions is Undefined Behaviour, which typically results in a segfault. That doesn't happen with yyin because the NULL is never passed through to stdio; the (f)lex interface specifies that setting yyin to NULL results in the default behaviour of reading from stdin. That's what happens with your code. Similarly, a NULL yyout is treated as though it were stdout.

It's fine to rely on this behaviour from Flex. It's well-documented and guaranteed by the Posix standard. But it should only be used deliberately, not accidentally.

If your application is invoked with no command-line arguments, then argc will be 1, argv[0] will be the name used to invoke the program, and argv[1] will be NULL. (Technically, argc could be 0, with even worse consequences, but that's unlikely in practice.) You then pass that NULL to fopen, which is Undefined Behaviour (that is to say, a grievous error). The implementation of fopen in your standard library seems to return an error indication rather than segfaulting, but as noted above you don't check for this error return. So the compounding of errors happens to result in yyin being NULL, and Flex reading from stdin.

You should always check for validity of user input. Always. Without exception. And you should report errors, or deal with them. There are no excuses. Not checking is dangerous, and at best wastes a lot of time; yours and that of whoever you enlist to help you.

Correct code might look like this:

    if (argc > 1) {
        yyin = fopen(argv[1], "r");
        if (yyin == NULL) {
            fprintf("Could not open file '%s': %s\n",
                     argv[1], strerror(errno));
            exit(1);
        }
    }
    else {
        /* If there was no command line argument,
         * argc is <= 1. We could leave `yyin` set to NULL
         * but it's better to be explicit.
         */
        yyin = stdin;
    }
  • Related