I was curious about whether the compiler automatically removed branches when it could so I set up an easy case:
bool testConditionsIf(int i, int j) {
if (i == 1){
if (j == 2){
return true;
}
return j == 3;
}
return j == 4;
}
bool testConditionsTernary(int i, int j) {
return i == 1 ? (j == 2 ? true : j == 3) : j == 4;
}
bool testConditionsNoBranch(int i, int j) {
return (i == 1 && (j == 2 || j == 3)) || (i != 1 && j == 4);
}
Each function is semantically equivalent to the next. The second rewrites the first with ternaries instead of if statements. The third rewrites the second by using boolean logic to handle both sides of the if statement. Interestingly enough, with gcc the first two produce the same code, but the third does not. The first two are assembled with conditional checks and jumps, much like the code is written. The third has a single conditional check on 1 and then returns the boolean computation of each branch. So I have a few questions:
- Why was a condition check jump inserted into the third function when that is known to be a CPU slowdown?
- Why dont all three compile to the same thing?
- What is stopping the compiler from compiling the first two to the third?
- The byte code for the first two does a condition check for j == 4 before doing a condition check for i == 1, why? Doesnt that seem like wasted work if i == 1?
- If the compiler does a check for i == 1 before the jump, why is it doing that check again after the jump?
- Why does testConditionsNoBranch have branches in it with -O0 optimization flag? The code has no branches in it as written.
https://www.godbolt.org/z/P69TGMsja
CodePudding user response:
The ternary operator is the same as if
(resulting compiled code wise) and the code generated can contain branches.
it with -O0 optimization flag
Testing code quality with optimizations disabled makes no sense.
You need to convert it into an arithmetic expression which is very likely to be compiled without any branches.
bool testConditionsNoBranch1(int i, int j) {
return (i == 1) * (j == 2) (i == 1) * (j == 3) (i != 1) * (j == 4);
}
and it will be compiled (using -Ofast) to
testConditionsNoBranch1(int, int):
cmp edi, 1
sete cl
xor eax, eax
cmp esi, 2
sete al
xor edx, edx
and eax, ecx
cmp esi, 3
sete dl
and edx, ecx
add eax, edx
cmp edi, 1
setne cl
xor edx, edx
cmp esi, 4
sete dl
and edx, ecx
add eax, edx
setne al
ret
CodePudding user response:
I can certainly take a stab at some of these
- Why was a condition check jump inserted into the third function when that is known to be a CPU slowdown?
This one I can't answer.
- Why dont all three compile to the same thing?
This one is simple enough. The compilers are written by different people/teams, they have to follow the standard, but the standard allows some room for interpretation (for instance, GCC and Clang disagree on scope resolution if memory serves), and then there are limitations on what the compilers can do because of how they are written. While they all work in the same basic way, the implementations are quite different.
Or to put it shortly, they differ because they are different.
- What is stopping the compiler from compiling the first two to the third?
Missed optimization opportunities?
Maybe missing optimizations?
Its also possible the compilers considered the 3rd and decided against it.
Clang offers great debugging info on its optimization passes, and that would be the first place i would look.
- The byte code for the first two does a condition check for j == 4 before doing a condition check for i == 1, why? Doesnt that seem like wasted work if i == 1?
Obviously this is speculation, but the compiler is probably assuming i != 1
. It is an int, Afterall. It has a 1 in 4,294,967,295 chance of being 1.
- If the compiler does a check for i == 1 before the jump, why is it doing that check again after the jump?
After the jump its actually checking for 3. Its just coincidence that 3 - 2 == 1
- Why does testConditionsNoBranch have branches in it with -O0 optimization flag? The code has no branches in it as written.
Logical operators (||
and &&
) requires evaluating multiple conditions.
bool b = (i == 1 && j ==2)
is semantically the same as
bool b = false;
if (i == 1)
if (j == 2)
b = true;