Why is compile-time execution significantly faster than runtime execution?-CodePudding

Contrary to what this question says, this piece of code is exhibiting some weird behaviour:

long long int fibonacci(int num) {
    if (num <= 2) return 1;
    return fibonacci(num - 1)   fibonacci(num - 2);
}

int main() {
    auto t1 = std::chrono::high_resolution_clock::now();
    long long int x = fibonacci(45);
    auto t2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> time(t2 - t1);
    std::cout << "Time taken: " << time.count() << "ms";
}

On my machine, this compiles in ~700ms with -O3 (GCC) and the output is:

Time taken: 2667.55ms

I rewrote the above code with constexpr as follows:

constexpr long long int fibonacci(int num) {
    if (num <= 2) return 1;
    return fibonacci(num - 1)   fibonacci(num - 2);
}

int main() {
    auto t1 = std::chrono::high_resolution_clock::now();
    constexpr long long int x = fibonacci(45);
    auto t2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> time(t2 - t1);
    std::cout << "Time taken: " << time.count() << "ms";
}

Which compiles in roughly the same time but the output is:

Time taken: 0ms

As it stands, evaluating fibonacci(45) at compile-time is much faster than evaluating it at runtime. To eliminate multi-core compiling as a reason (which definitely isn't), I re-ran the second block above with fibonacci(60). Again, the code compiles in the same amount of time but the output is:

Time taken: 29499.6ms

What causes this big performance gap?

CodePudding user response：

Basically, for that short piece of code, compile time does not matter.

Even, if you do compile time evaluation.

The main problem is here the utmost bad algorithm used here. Using 2 recursive calls which will then call again 2 recursive functions and so on and so on has really the worst case time complexity for such a simple algorithm.

That problem can and must be solved with an iterative approach.

Something like that:

unsigned long long fibonacci(size_t index) noexcept {
    unsigned long long f1{ 0ull }, f2{ 1ull }, f3{};
    while (index--) { f3 = f2   f1; f1 = f2; f2 = f3; }
    return f2;
}

If you use this function, then, reagrdless of optimization or not, your main will run in below 1ms.

In you original main function, if you do not use the result of the calculation, so the variable "X". The optimizing compiler can eliminate the complete function call. It is not necessary to call that function. The result is not used.

Make the following experiment.

Add std::cout << x << '\n'; as the last statement in your main function. You will be surprised. With enabled optimization, the function will be called at the very end. And you time measurement measures nothing.

Now to the compiler time version. Also the compiler will internally used optimized code. And it will convert the nonesense recursive approach to an iterative approach. And to calculate that values will take less time than the compiler overhead functions.

So, that is the real reason.

And the Fibonacci numbers can always be compiled at runtime. There are only 93 possible results for an 64 bit result value.

See the following approach, which creates a compile time array of all Fibonacci numbers. And if you want the n'th number, it is just a lookup value.

This will work very fast in either non-optimized or optimized mode.

It is the fasted possible solution for that problem.

#include <iostream>
#include <utility>
#include <array>
#include <algorithm>
#include <iterator>

// All done during compile time -------------------------------------------------------------------
constexpr unsigned long long getFibonacciNumber(size_t index) noexcept {
    unsigned long long f1{ 0ull }, f2{ 1ull }, f3{};
    while (index--) { f3 = f2   f1; f1 = f2; f2 = f3; }
    return f2;
}
// Some helper to create a constexpr std::array initilized by a generator function
template <typename Generator, size_t ... Indices>
constexpr auto generateArrayHelper(Generator generator, std::index_sequence<Indices...>) {
    return std::array<decltype(std::declval<Generator>()(size_t{})), sizeof...(Indices) > { generator(Indices)... };
}
template <size_t Size, typename Generator>
constexpr auto generateArray(Generator generator) {
    return  generateArrayHelper(generator, std::make_index_sequence<Size>());
}
constexpr size_t MaxIndexFor64BitFibonacci = 93;

// This is the definition of a std::array<unsigned long long, 93> with all Fibonacci numbers in it
constexpr auto Fib = generateArray<MaxIndexFor64BitFibonacci>(getFibonacciNumber);
// End of: All done during compile time -----------------------------------------------------------


// The only code executed during runtime
int main() {

    // Show all Fibonacci numbers up to border value
    for (const auto fib : Fib) std::cout << fib << '\n';
}

CodePudding user response：

I'm not sure why this question is even appears unless you didn't exactly read up how constexpr is described. You can't work by "guessing" what's going on in program written in C unless the behaviour is part of UB or platform-defined.

C standard doesn't describe memoization in any way, thus we have no reason to suspect that such complex caching procedure may be used. Languages which support memoization (or things called "enterring", "tombstoning", and similar) are usually the same that are running intermediate code in some virtual machine and de-facto do not provide mechanics for compile-time evaluation as such never can be done for unknown platform.

constexpr allows creation of "constants" which are initialized through code executed on compiler's side (with all possible platform-defined effects and all restrictions applied in related standard).

Without it as-if rule applies, which includes freedom for compiler not to do anything about changing code. Compiler is also allowed to throw whole thing way if result is not observable and it is clear that there is no side-effects of function call, which is easier to check when constexpr is used, i.e. there are no side-effects by definition.

In constexpr version whole execution stack would be unwinded because you explicitly told compiler that it is possible. The limitations on constexpr functions imposed by standard guarantee that function would be fit for such evaluation.

There can be unexpected results when cross-compiling for platform which got behaviour different from host one: you may get constexpr values for compiler-side platform.

With recursive constexpr functions you risk to exceed limit of recursions which is implementation-defined. E.g. your case wouldn't compile with some builds of g at all as you have more than 32 millions of iterations there due non-optimal recursion.

In second case no code is executed between calls to chrono aside of initialization of x with value calculated by compiler. If you used static constexpr long long int x for the local variable, then even initialization would be gone, it would be a hard-coded constant value somewhere in data or in code where it's used. static formally prolongs life of object till the end of process, while automatic constexpr function would be initialized each time function is entered (irrelevant to fact that main() should be entered only once).