Home > Blockchain >  gcc builtin function and custom function both invoked in same program
gcc builtin function and custom function both invoked in same program

Time:10-27

I'm trying to understand when gcc's builtin functions are used. In the following code, both gcc's sqrt() and my custom sqrt() are invoked when I compile without -fno-builtin. Can someone explain what is going on?

Also, I know the list of gcc's builtin functions is at https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html and realize the recommended way around these types of problems is to just rename the conflicting function. Is there a gcc output option/warning that will show when a custom function is named the same as a builtin or when a builtin is used instead of the custom function?

#include <stdio.h>

double sqrt(double);

int main(void)
{
    double n;

    n = 2.0;
    printf("%f\n", sqrt(n));

    printf("%f\n", sqrt(2.0));

    return 0;
}

double sqrt(double x)
{
    printf("my_sqrt ");
    return x;
}

running after compiling with gcc -o my_sqrt my_sqrt.c the output is:

my_sqrt 2.000000
1.414214

running after compiling with gcc -fno-builtin -o my_sqrt my_sqrt.c the output is:

my_sqrt 2.000000
my_sqrt 2.000000

CodePudding user response:

It's not the case that two different sqrt functions are called at runtime. The call to sqrt(2.0) happens at compile time, which is legal because 2.0 is a constant and sqrt is a standard library function, so the compiler knows its semantics. And the compiler is allowed to assume that you are not breaking the rules. We'll get around to what that means in a minute.

At runtime, there is no guarantee that your sqrt function will be called for sqrt(n), but it might be. GCC uses your sqrt function, unless you declare n to be const double; Clang goes ahead and does the computation at compile time because it can figure out what n contains at that point is known. Both of them will use the built-in sqrt function (unless you specify -fno-builtin) for an expression whose value cannot be known at compile-time. But that doesn't necessarily mean that they will issue code to call a function; if the machine has a reliable SQRT opcode, the compiler could choose to just emit it rather than emitting a function call.

The C standard gives compilers a lot of latitude here, because it only requires the observable behaviour of a program to be consistent with the results specified by the semantics in the standard, and furthermore it only requires that to be the case if the program does not exhibit undefined behaviour. So the compiler is basically free to do any computation it wants at compile-time, provided that a program without undefined behaviour would produce the same result. [Note 1].

Moreover, the definition of "the same result" is also a bit loose for floating point computations, because the standard semantics do not prevent computations from being done with more precision than the data types can theoretically represent. That may seem innocuous, but in some cases a computation with extra precision can produce a different result after rounding. So if during compilation a compiler can compute a more accurate intermediate result than would result from the code it would have generated for run-time computation of the same expression, that's fine as far as the standard is concerned. (And most of the time, it will be fine for you, too. But there are exceptions.)

To return to the main point, it still seems surprising that the compiler, which knows that you have redefined the sqrt function, can still use the built-in sqrt function in its compile-time computation. The reason is simple (although often ignored): your program is not valid. It exhibits undefined behaviour, and when your program has undefined behaviour, all bets are off.

The undefined behaviour is specified in §7.1.3 of the standard, which concerns Reserved Identifiers. It supplies a list of reserved identifiers, which really are reserved, whether the compiler you happen to be using warns you about that or not. The list includes the following, which I'll quote in full:

All identifiers with external linkage in any of the following subclauses (including the future library directions) and errno are always reserved for use as identifiers with external linkage.

The "following subclauses" at point contain the list of standard library functions, all of which have external linkage. Just to nail the point home, the standard continues with:

If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), the behavior is undefined. [Note 2]

You have declared sqrt as an externally-visible function, and that's not permitted whether or not you include math.h. So you're in undefined behaviour territory, and the compiler is perfectly entitled to not worry about your definition of the sqrt function when it is doing compile-time computation. [Note 3]

(You could try to declare your sqrt implementation as static in order to avoid the restriction on externally-visible names. That will work with recent versions of GCC; it allows the static declaration to override the standard library definition. Clang, which is more aggressive about compile-time computations, still uses the standard definition of sqrt. And a quick test with MSVC (on godbolt.org) seems to indicate that it just outright bans redefinition of the standard library function.)

So what if you really really want to write sqrt(x) for your own definition of sqrt? The standard does give you an out: since sqrt is not reserved for macro names, you can define it as a macro which is substituted by the name of your implementation [Note 4], at least if you don't #include <math.h>. If you do include the header, then this is probably not conformant, because in that case the identifiers are reserved as well for macro names [Note 5].

Notes

  1. That liberty is not extended to integer constant expressions, with the result that a compiler cannot turn strlen("Hello") into the constant value 5 in a context where an integer constant expression is required. So this is not legal:

    switch (i) {
        case strlen("Hello"):
            puts("world");
            break;
        default: break;
    }
    

    But this will probably not call strlen six times (although you shouldn't count on that optimisation, either):

    /* Please don't do this. Calling strlen on every loop iteration
     * blows up linear-time loops into quadratic time monsters, which is
     * an open invitation for someone to do a denial-of-service attackç
     * against you by supplying a very long string.
     */
    for (int i = 0; i < strlen("Hello");   i) {
        putchar("world"[i]);
    }
    
  2. Up to the current C standard, this statement was paragraph 2 of §7.1.3. In the C23 draft, though, it has been moved to paragraph 8 of §6.4.2.1 (the lexical rules for identifiers). There are some other changes to the restrictions on reserved identifiers (and a large number of new reserved identifiers), but that doesn't make any difference in this particular case.

  3. In many instances of undefined behaviour, the intent is simply to let the compiler avoid doing extra sanity checks. Instead, it can just assume that you didn't break the rules, and do whatever it would otherwise do.

  4. Please don't use the name _sqrt, even though it will probably work. Names starting with underscores are all reserved, by the same §7.1.3. If the name starts with two underscores or an underscore followed by a capital letter, it is reserved for all uses. Other identifiers starting with an underscore are reserved for use at file scope (both as a function name and as a struct tag). So don't do that. If you want to use underscores to indicate that the name is somehow internal to your code, put it at the end of the indentifier rather than at the beginning.

  5. Standard headers may also define the names of standard library functions as function-like macros, possibly in order to substitute a different reserved name, known to the compiler, which causes the generation of inline code, perhaps using special-purpose machine opcodes. Regardless, the standard requires that the functions exist, and it allows you to #undef the macros in order to guarantee that the actual function will be used. But it doesn't explicitly allow the names to be redefined.

  •  Tags:  
  • cgcc
  • Related