Home > Net >  Why do GCC and Clang produce different output with variable length array?
Why do GCC and Clang produce different output with variable length array?

Time:04-02

Why do GCC and Clang produce different output with this conforming C code:

int (puts) (); int (main) (main, puts) int main;
char *puts[(&puts) (&main["\0April 1"])]; <%%>

Neither compiler produces any warning or error even with -Wall -std=c18 -pedantic, but the program produces no output when built with GCC but prints the current date when built with Clang.

CodePudding user response:

Why do GCC and Clang produce different output with this conforming C code:

int (puts) (); int (main) (main, puts) int main;
char *puts[(&puts) (&main["\0April 1"])]; <%%>

In the first place, it is conforming code, though it does make use of a variable-length array, which is an optional language feature in C11 and C17. Some of the obfuscations are

  • use of the obscure digraphs <% and %>, which mean the same thing as { and }, respectively.
  • parenthesizing the function identifiers in function declarations
  • a forward declaration of function puts that is not a prototype
  • a K&R-style definition of function main
    • with a VLA parameter
      • whose dimension expression contains a function call
      • and a reference to another parameter
  • use of unconventional identifiers for the parameters to function main()
  • use of identifiers (puts and main) in declarations of an object and a function, respectively, with the same identifier
  • use of the identifier main for something more than the program's entry-point function
  • inversion of the conventional order of the operands of the indexing operator ([])
    • plus, indexing a sting literal
  • calling a function via an explicit function pointer constant expression
  • A string literal with an explicit null character within
  • Unconventional placement (and omission) of line breaks

A less obfuscated equivalent would be

int puts();

int main(
    int argc,
    char *argv[ puts("\0April 1"   argc) ]
) {
}

But the central question about the difference in behavior between the version compiled with GCC and the one built with Clang comes down to whether the expression for the size of the VLA function parameter is evaluated at runtime.

The language spec says that when a function parameter is declared with array type, its type is "adjusted" to the corresponding pointer type. That applies equally to complete, incomplete, and variable-length array types, but the spec does not explicitly say that the expression(s) for the dimension(s) are not evaluated. It does specify that expressions go unevaluated in certain other cases, and it even makes an exception to such a rule in the case of sizeof expressions involving VLAs, so the omission in this case could be interpreted as meaningful.

That makes a difference only for parameters of VLA type, because only for those can evaluation of the dimension expression(s) produce side effects on the machine state, including, but not limited to, observable program behavior.

GCC does not evaluate the VLA parameter's size expression at runtime, and I am inclined to take this as conforming to the intent of the standard. As a result, the GCC-compiled program does nothing but exit with status 0.

Clang does evaluate the VLA parameter's size expression at runtime. Although I disfavor this interpretation of the spec, I cannot rule it out. When it does evaluate the size expression, it uses the passed value of the first parameter. When the program is run without arguments, then the first parameter has value 1, with the result that the standard library's puts function is called with a pointer to the 'A' in "\0April 1".

CodePudding user response:

int (puts) ();
int (main) (main, puts)
    int main;
    char *puts[(&puts) (&main["\0April 1"])];
{
}

Somebody's got a compiler bug; I'm just not sure who anymore. I don't understand why any compiler would emit code to evaluate the size parameter of a VLA as an argument.

The clang output is rather bizarre. For it to work, it would have had to find main in the function's scope but puts in the global scope despite having already encountered the declaration for puts. Normally, you can access a variable in its own declaration.

If somebody did this in production code my answer would be rather: "Stop using K&R function definitions."

  •  Tags:  
  • c
  • Related