I know that the topic "short circuit operators" has been discussed a lot here on StackExchange. But w.r.t specifically the 'C' language, there is an aspect that bothers me for quite a while.
Consider some C code like
if (b && f(&s)) {...}
where b is of boolean type, s is some variable that has the type of some more or less elaborate struct and f is a getter to the struct. A more concrete example may look like:
if (active && isEmpty(&list)) {...}
Usually, I would expect that any decent compiler will inline the isEmpty
function and optimize the whole expression to a single AND
assembler instruction with the first operand already held in a data register and the second operand in memory, accessed via 'address register with offset' addressing mode (or something similar).
However, since the compiler is enforced (by the language rules) to generate the short circuit form, this kind of optimization must not be done and would ultimately produce drastically inefficient code. It is hard for me to believe that this is actually happening!
What version of assembler code will an optimizing C compiler usually generate?
CodePudding user response:
Yes it is slightly inefficient but not because of inlining, but because instruction re-ordering isn't possible. In case the left operand is the most execution-heavy one, it still has to be executed first.
For example consider search_algorithm() && bool_flag
, the slow function must always get executed, even though the bool flag check is much quicker and in case it would evaluate to zero, the whole expression is false. The compiler can't do anything about that, we are forced to rely on manual optimization by swapping the operands.
Inlining may happen regardless of where the function call happens in an expression. It's not the inlining in itself that's blocked, it's the execution of the code in the function.
The &&
/||
operators could potentially also mean that an extra branch gets introduced in the machine code, since if(a && b)
is equivalent to if(a) { if(b) { ... } }
.
Inlining example for x86:
#include <stdio.h>
int afunc (int a)
{
puts(__func__);
return a;
}
int bfunc (int b)
{
puts(__func__);
return b;
}
_Bool and (int a, int b)
{
return afunc(a) && bfunc(b);
}
gcc -O3
generates this code for the and
function:
afunc:
push r12
mov r12d, edi
mov edi, OFFSET FLAT:__func__.1
call puts
mov eax, r12d
pop r12
ret
bfunc:
push r12
mov r12d, edi
mov edi, OFFSET FLAT:__func__.0
call puts
mov eax, r12d
pop r12
ret
and:
push rbp
mov ebp, esi
push rbx
mov ebx, edi
mov edi, OFFSET FLAT:__func__.1
sub rsp, 8
call puts
xor eax, eax
test ebx, ebx
jne .L12
add rsp, 8
pop rbx
pop rbp
ret
.L12:
mov edi, OFFSET FLAT:__func__.0
call puts
test ebp, ebp
setne al
add rsp, 8
pop rbx
pop rbp
ret
__func__.0:
.string "bfunc"
__func__.1:
.string "afunc"
The relevant part is the and:
function:
- It inlines the function call by directly loading the string address
OFFSET FLAT:__func__.1
corresponding to"afunc"
into a register and then callsputs
. - It checks the value of
a
here:test ebx, ebx
and then introduces the extra branchjne .L12
based on the outcome..L12
being the right operand evaluation.
Both functions are inlined. They are executed left-to-right and the right operand is only evaluated in case "jump not equal to zero" leads to a jump, otherwise the function returns. clang and icc gives nearly identical machine code.
CodePudding user response:
What version of assembler code will an optimizing C compiler usually generate?
This is an evolving question, and compilers have gotten better and better over the years. Good programmers become familiar with the capabilities of their compilers and write code that takes advantage of it. Thus, if active && isEmpty(&list)
versus isEmpty(&list) && active
is an issue, a good programmer will choose the one that is better for their situation, and/or they may dive into the implementation of isEmpty
and see if it can be improved for this.
However, since the compiler is enforced (by the language rules) to generate the short circuit form,…
This is true and not true. The C standard specifies behavior on two levels. The semantics of C are specified using a model of an abstract computer that explicitly does what the standard says. For &&
, it says “… the && operator guarantees left-to-right evaluation…,” so it is true that is what the abstract computer does in the model.
But a C implementation does not have to implement that model literally. On the “what is really implemented” level, instead of on the model level, a C implementation only has to match the observable behavior of the model, which is (C 2018 5.1.2.3 6):
- Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
- At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
- The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.
Now suppose the C compiler can see the implementation of isEmpty
, as it would when inlining it. In this case, it can see all the effects of active && isEmpty(&list)
. If active && isEmpty(&list)
contains no accesses to volatile objects and no writing to files or interacting with devices, then the actual implementation of the expression can be in any order at all, including any ordering of any subparts inside isEmpty
, providing the reordering computes the correct result. So, in the actual implementation, it is not true the compiler must guarantee the left-to-right ordering; it merely needs to guarantee the same result.
That correct result is an issue. Maybe active
guarantees that evaluation of isEmpty(&list)
will not use some null pointer, so active
needs to be evaluated “first.” However, if the compiler is inlining isEmpty
, it could evaluate parts of isEmpty
first, as long as it ensures none of its evaluations cause a problem if active
is false.
One thing that can block these sorts of optimizations is that isEmpty
might use other functions or objects that the compiler does not have a complete view of, so the compiler is forced to implement the abstract computer more literally in certain respects. This goes back to the good programmer issue—if this part of the program matters, a good programmer will consider their interactions here between the code for isEmpty
and what the compiler capabilities are. And that will likely change over the coming years.