Short Circuit Boolean operators in C (again): doesn't it produce inefficient code by default?-CodePudding

I know that the topic "short circuit operators" has been discussed a lot here on StackExchange. But w.r.t specifically the 'C' language, there is an aspect that bothers me for quite a while.

Consider some C code like

if (b && f(&s)) {...}

where b is of boolean type, s is some variable that has the type of some more or less elaborate struct and f is a getter to the struct. A more concrete example may look like:

if (active && isEmpty(&list)) {...}

Usually, I would expect that any decent compiler will inline the isEmpty function and optimize the whole expression to a single AND assembler instruction with the first operand already held in a data register and the second operand in memory, accessed via 'address register with offset' addressing mode (or something similar).

However, since the compiler is enforced (by the language rules) to generate the short circuit form, this kind of optimization must not be done and would ultimately produce drastically inefficient code. It is hard for me to believe that this is actually happening!

What version of assembler code will an optimizing C compiler usually generate?

CodePudding user response：

Yes it is slightly inefficient but not because of inlining, but because instruction re-ordering isn't possible. In case the left operand is the most execution-heavy one, it still has to be executed first.

For example consider search_algorithm() && bool_flag, the slow function must always get executed, even though the bool flag check is much quicker and in case it would evaluate to zero, the whole expression is false. The compiler can't do anything about that, we are forced to rely on manual optimization by swapping the operands.

Inlining may happen regardless of where the function call happens in an expression. It's not the inlining in itself that's blocked, it's the execution of the code in the function.

The &&/|| operators could potentially also mean that an extra branch gets introduced in the machine code, since if(a && b) is equivalent to if(a) { if(b) { ... } }.

Inlining example for x86:

#include <stdio.h>

int afunc (int a)
{
  puts(__func__);
  return a;
}

int bfunc (int b)
{
  puts(__func__);
  return b;
}

_Bool and (int a, int b)
{
  return afunc(a) && bfunc(b);
}

gcc -O3 generates this code for the and function:

afunc:
        push    r12
        mov     r12d, edi
        mov     edi, OFFSET FLAT:__func__.1
        call    puts
        mov     eax, r12d
        pop     r12
        ret
bfunc:
        push    r12
        mov     r12d, edi
        mov     edi, OFFSET FLAT:__func__.0
        call    puts
        mov     eax, r12d
        pop     r12
        ret
and:
        push    rbp
        mov     ebp, esi
        push    rbx
        mov     ebx, edi
        mov     edi, OFFSET FLAT:__func__.1
        sub     rsp, 8
        call    puts
        xor     eax, eax
        test    ebx, ebx
        jne     .L12
        add     rsp, 8
        pop     rbx
        pop     rbp
        ret
.L12:
        mov     edi, OFFSET FLAT:__func__.0
        call    puts
        test    ebp, ebp
        setne   al
        add     rsp, 8
        pop     rbx
        pop     rbp
        ret
__func__.0:
        .string "bfunc"
__func__.1:
        .string "afunc"

The relevant part is the and: function:

It inlines the function call by directly loading the string address OFFSET FLAT:__func__.1 corresponding to "afunc" into a register and then calls puts.
It checks the value of a here: test ebx, ebx and then introduces the extra branch jne .L12 based on the outcome. .L12 being the right operand evaluation.

Both functions are inlined. They are executed left-to-right and the right operand is only evaluated in case "jump not equal to zero" leads to a jump, otherwise the function returns. clang and icc gives nearly identical machine code.

CodePudding user response：

What version of assembler code will an optimizing C compiler usually generate?

This is an evolving question, and compilers have gotten better and better over the years. Good programmers become familiar with the capabilities of their compilers and write code that takes advantage of it. Thus, if active && isEmpty(&list) versus isEmpty(&list) && active is an issue, a good programmer will choose the one that is better for their situation, and/or they may dive into the implementation of isEmpty and see if it can be improved for this.

However, since the compiler is enforced (by the language rules) to generate the short circuit form,…

This is true and not true. The C standard specifies behavior on two levels. The semantics of C are specified using a model of an abstract computer that explicitly does what the standard says. For &&, it says “… the && operator guarantees left-to-right evaluation…,” so it is true that is what the abstract computer does in the model.

But a C implementation does not have to implement that model literally. On the “what is really implemented” level, instead of on the model level, a C implementation only has to match the observable behavior of the model, which is (C 2018 5.1.2.3 6):

Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.

Now suppose the C compiler can see the implementation of isEmpty, as it would when inlining it. In this case, it can see all the effects of active && isEmpty(&list). If active && isEmpty(&list) contains no accesses to volatile objects and no writing to files or interacting with devices, then the actual implementation of the expression can be in any order at all, including any ordering of any subparts inside isEmpty, providing the reordering computes the correct result. So, in the actual implementation, it is not true the compiler must guarantee the left-to-right ordering; it merely needs to guarantee the same result.

That correct result is an issue. Maybe active guarantees that evaluation of isEmpty(&list) will not use some null pointer, so active needs to be evaluated “first.” However, if the compiler is inlining isEmpty, it could evaluate parts of isEmpty first, as long as it ensures none of its evaluations cause a problem if active is false.

One thing that can block these sorts of optimizations is that isEmpty might use other functions or objects that the compiler does not have a complete view of, so the compiler is forced to implement the abstract computer more literally in certain respects. This goes back to the good programmer issue—if this part of the program matters, a good programmer will consider their interactions here between the code for isEmpty and what the compiler capabilities are. And that will likely change over the coming years.