Why does the compiler not automatically remove branches?-CodePudding

I was curious about whether the compiler automatically removed branches when it could so I set up an easy case:

bool testConditionsIf(int i, int j) {
    if (i == 1){
        if (j == 2){
            return true;
        }
        return j == 3;
    }
    return  j == 4;
}

bool testConditionsTernary(int i, int j) {
    return i == 1 ? (j == 2 ? true : j == 3) : j == 4;
}

bool testConditionsNoBranch(int i, int j) {
    return (i == 1 && (j == 2 || j == 3)) || (i != 1 && j == 4);
}

Each function is semantically equivalent to the next. The second rewrites the first with ternaries instead of if statements. The third rewrites the second by using boolean logic to handle both sides of the if statement. Interestingly enough, with gcc the first two produce the same code, but the third does not. The first two are assembled with conditional checks and jumps, much like the code is written. The third has a single conditional check on 1 and then returns the boolean computation of each branch. So I have a few questions:

Why was a condition check jump inserted into the third function when that is known to be a CPU slowdown?
Why dont all three compile to the same thing?
What is stopping the compiler from compiling the first two to the third?
The byte code for the first two does a condition check for j == 4 before doing a condition check for i == 1, why? Doesnt that seem like wasted work if i == 1?
If the compiler does a check for i == 1 before the jump, why is it doing that check again after the jump?
Why does testConditionsNoBranch have branches in it with -O0 optimization flag? The code has no branches in it as written.

https://www.godbolt.org/z/P69TGMsja

CodePudding user response：

The ternary operator is the same as if (resulting compiled code wise) and the code generated can contain branches.

it with -O0 optimization flag

Testing code quality with optimizations disabled makes no sense.

You need to convert it into an arithmetic expression which is very likely to be compiled without any branches.

bool testConditionsNoBranch1(int i, int j) {
    return (i == 1) * (j == 2)   (i == 1) * (j == 3)   (i != 1) * (j == 4);
}

and it will be compiled (using -Ofast) to

testConditionsNoBranch1(int, int):
        cmp     edi, 1
        sete    cl
        xor     eax, eax
        cmp     esi, 2
        sete    al
        xor     edx, edx
        and     eax, ecx
        cmp     esi, 3
        sete    dl
        and     edx, ecx
        add     eax, edx
        cmp     edi, 1
        setne   cl
        xor     edx, edx
        cmp     esi, 4
        sete    dl
        and     edx, ecx
        add     eax, edx
        setne   al
        ret

CodePudding user response：

I can certainly take a stab at some of these

Why was a condition check jump inserted into the third function when that is known to be a CPU slowdown?

This one I can't answer.

Why dont all three compile to the same thing?

This one is simple enough. The compilers are written by different people/teams, they have to follow the standard, but the standard allows some room for interpretation (for instance, GCC and Clang disagree on scope resolution if memory serves), and then there are limitations on what the compilers can do because of how they are written. While they all work in the same basic way, the implementations are quite different.
Or to put it shortly, they differ because they are different.

What is stopping the compiler from compiling the first two to the third?

Missed optimization opportunities?
Maybe missing optimizations?
Its also possible the compilers considered the 3rd and decided against it.
Clang offers great debugging info on its optimization passes, and that would be the first place i would look.

The byte code for the first two does a condition check for j == 4 before doing a condition check for i == 1, why? Doesnt that seem like wasted work if i == 1?

Obviously this is speculation, but the compiler is probably assuming i != 1. It is an int, Afterall. It has a 1 in 4,294,967,295 chance of being 1.

If the compiler does a check for i == 1 before the jump, why is it doing that check again after the jump?

After the jump its actually checking for 3. Its just coincidence that 3 - 2 == 1

Why does testConditionsNoBranch have branches in it with -O0 optimization flag? The code has no branches in it as written.

Logical operators (|| and &&) requires evaluating multiple conditions.

bool b = (i == 1 && j ==2)

is semantically the same as

bool b = false;
if (i == 1)
    if (j == 2)
        b = true;