Home > Mobile >  Performance of boost::mp11::mp_with_index compared to array of std::function
Performance of boost::mp11::mp_with_index compared to array of std::function

Time:12-28

Consider the following two snippets:

// Option (1).
boost::mp11::mp_with_index<N>(i,
                              [&](const auto i){ function<i>(/* args... */); });

and

// Option (2).
inline static const std::array<std::function<void(/* Args... */)>, N>
    functionArray{function<0>, ..., function<N-1>};
functionArray[i](/* args... */);

where N is a compile time size approximately in the range [0, 20], i is a runtime index between 0 and N-1, and template <size_t I> function(/* Args... */) is a template function with a known signature. Which of the two options is the fastest one?

Note: I know that boost::mp11::mp_with_index basically creates a switch statement that allows to convert a runtime index to a compile time one. This introduces some indirection, but I expect this to not be too costly. Similarly, I know that std::function introduces some indirection due to type erasure. My question is: which of the two indirection kinds is the most efficient?

CodePudding user response:

std::array<std::function<void(Args...)>, N> is likely to introduce some overhead compared to an array of pure pointers std::array<void(*)(Args...), N>.

Looking at the generated assembly at https://godbolt.org/z/a8z9aKs7P, the following observations can be made:

boost::mp11::mp_with_index is compiled to a branch table that holds N addresses for N different instructions that simply call function<I> when jumped to. So, it will look for an address in the branch table, jump to that address, and then jump again to the desired function.

This branch table could be simplified by having it simply store the addresses of function<I>, only needing one jump. This is what happens when you have an array of function pointers, the array essentially being a branch table.

std::function is similar, but calling a std::function is slightly more complicated than calling a regular function pointer.


Note clang is horrible at optimising this at -O2 and even -O3. boost::mp11::mp_with_index<N> is actually a bunch of if/else statements that should be very easy to compile as if it was a switch, but clang fails to do that (and left with N "compare and conditional jump" instructions). The array of function pointers is the only good option here.

  • Related