How good is the Visual Studio compiler at branch-prediction for simple if-statements?-CodePudding

Here is some c pseudo-code as an example:

bool importantFlag = false;
for (SomeObject obj : arr) {
    if (obj.someBool) {
        importantFlag = true;
    }
    obj.doSomethingUnrelated();
}

Obviously, once the if-statement evaluates as true and runs the code inside, there is no reason to even perform the check again since the result will be the same either way. Is the compiler smart enough to recognize this or will it continue checking the if-statement with each loop iteration and possibly redundantly assign importantFlag to true again? This could potentially have a noticeable impact on performance if the number of loop iterations is large, and breaking out of the loop is not an option here.

I generally ignore these kinds of situations and just put my faith into the compiler, but it would be nice to know exactly how it handles these kinds of situations.

CodePudding user response：

Branch-prediction is a run-time thing, done by the CPU not the compiler.

The relevant optimization here would be if-conversion to a very cheap branchless flag |= obj.someBool;.

Ahead-of-time C compilers make machine code for the CPU to run; they aren't interpreters. See also Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” and How to remove "noise" from GCC/clang assembly output? for more about looking at optimized compiler-generated asm.

I guess what you're suggesting could be making a 2nd version of the loop that doesn't look at the bool, and converting the if() into an if() goto to set the flag once and then run the other version of the loop from this point onward. That would likely not be worth it, since a single OR instruction is so cheap if other members of the same object are already getting accessed.

But it's a plausible optimization; however I don't think compilers would typically do it for you. You can of course do it manually, although you'd have to iterate manually instead of using a range-for, because you want to use the same iterator to start part-way through the range.

Branch likelihood estimation at compile time is a thing compilers do to figure out whether branchy or branchless code is appropriate, e.g. gcc optimization flag -O3 makes code slower than -O2 uses CMOV for a case that looks unpredictable, but when run on sorted data is actually very predictable. My answer there shows the asm real-world compilers make; note that they don't multi-version the loop, although that wouldn't be possible in that case if the compiler didn't know about the data being sorted.

CodePudding user response：

Peter Cordes has already given a great answer.

I'd also like to mention shortcircuiting

In this example

if( importantFlag || some_expensive_check() ) {
        importantFlag = true;
}

Once important Flag is set to true, the expensive check will never be performed, since the || stops at the first true.

CodePudding user response：

If you expect to have a large data set you could just have two for loops, the first of them breaking when the importantFlag is set to true. It's hard to know specifically what optimizations the compiler will make since it's not well documented.