Why isn’t my code with C 20 likely/unlikely attributes faster?-CodePudding

Code ran on Visual Studio 2019 Version 16.11.8 with /O2 Optimization and Intel CPU. I am trying to find the root cause for this counter-intuitive result I get that without attributes is statistically faster than with attributes via t-test. I am not sure what is the root cause for this. Could it be some sort of cache? Or some magic the compiler is doing - I cannot really read assembly

     #include <chrono>
     #include <iomanip>
     #include <iostream>
     #include <numeric>
     #include <random>
     #include <vector>
     #include <cmath>
     #include <functional>
    
    static const size_t NUM_EXPERIMENTS = 1000;
    
    double calc_mean(std::vector<double>& vec) {
        double sum = 0;
        for (auto& x : vec)
            sum  = x;
        return sum / vec.size();
    }
    
    double calc_deviation(std::vector<double>& vec) {
        double sum = 0;
        for (int i = 0; i < vec.size(); i  )
            sum = sum   (vec[i] - calc_mean(vec)) * (vec[i] - calc_mean(vec));
        return sqrt(sum / (vec.size()));
    }
    
    double calc_ttest(std::vector<double> vec1, std::vector<double> vec2){
        double mean1 = calc_mean(vec1);
        double mean2 = calc_mean(vec2);
        double sd1 = calc_deviation(vec1);
        double sd2 = calc_deviation(vec2);
        double t_test = (mean1 - mean2) / sqrt((sd1 * sd1) / vec1.size()   (sd2 * sd2) / vec2.size());
        return t_test;
    }
    
    namespace with_attributes {
        double calc(double x) noexcept {
            if (x > 2) [[unlikely]]
                return sqrt(x);
            else [[likely]]
                return pow(x, 2);
        }
    }  // namespace with_attributes
    
    
    namespace no_attributes {
        double calc(double x) noexcept {
            if (x > 2)
                return sqrt(x);
            else
                return pow(x, 2);
        }
    }  // namespace with_attributes
    
    std::vector<double> benchmark(std::function<double(double)> calc_func) {
        std::vector<double> vec;
        vec.reserve(NUM_EXPERIMENTS);
    
        std::mt19937 mersenne_engine(12);
        std::uniform_real_distribution<double> dist{ 1, 2.2 };
    
        for (size_t i = 0; i < NUM_EXPERIMENTS; i  ) {
    
            const auto start = std::chrono::high_resolution_clock::now();
            for (auto size{ 1ULL }; size != 100000ULL;   size) {
                double x = dist(mersenne_engine);
                calc_func(x);
            }
            const std::chrono::duration<double> diff =
                std::chrono::high_resolution_clock::now() - start;
            vec.push_back(diff.count());
        }
        return vec;
    }
    
    int main() {
    
        std::vector<double> vec1 = benchmark(with_attributes::calc);
        std::vector<double> vec2 = benchmark(no_attributes::calc);
        std::cout << "with attribute: " << std::fixed << std::setprecision(6) << calc_mean(vec1) << '\n';
        std::cout << "without attribute: " << std::fixed << std::setprecision(6) << calc_mean(vec2) << '\n';
        std::cout << "T statistics" << std::fixed << std::setprecision(6) << calc_ttest(vec1, vec2) << '\n';
    }

CodePudding user response：

Per godbolt, the two functions generates identical assembly under msvc

        movsd   xmm1, QWORD PTR __real@4000000000000000
        comisd  xmm0, xmm1
        jbe     SHORT $LN2@calc
        xorps   xmm1, xmm1
        ucomisd xmm1, xmm0
        ja      SHORT $LN7@calc
        sqrtsd  xmm0, xmm0
        ret     0
$LN7@calc:
        jmp     sqrt
$LN2@calc:
        jmp     pow

Since msvc is not a open source compiler, one could only guess why msvc would choose to ignore this optimization -- maybe because two branches are all function call(it's tail call so jmp instead of call) and that's too costy for [[likely]] to make a difference.

If compiler is changed to clang, it's smart enough to optimize power 2 into x * x, so different code would be generated. Following that lead, if your code is modified into

        double calc(double x) noexcept {
            if (x > 2)
                return x   1;
            else
                return x - 2;
        }

msvc would also output different layout.

CodePudding user response：

Compilers are smart. These days, they are very smart. They do a lot of work to figure out when they need to do things.

The likely and unlikely attributes exist to solve extremely specific problems. Problems that only become apparent after deep analysis of the performance characteristics, and generated assembly, of a particular piece of performance-critical code. They are not a salve you rub into any old code to make it go faster.

They are a scalpel. And without surgical training, a scalpel is likely to be misused.

So unless you have specific knowledge of a performance problem which analysis of assembly shows can be solved by better branch prediction, you should not assume that any use of these attributes will make any particular code go faster.

That is, the result you're getting is entirely legitimate.