Consider this code
#include <iostream>
typedef long xint;
template<int N>
struct foz {
template<int i=0>
static void foo(xint t) {
for (int j=0; j<10; j) {
foo<i 1> (t j);
}
}
template<>
static void foo<N>(xint t) {
std::cout << t;
}
};
int main() {
foz<8>::foo<0>(0);
}
When compiling in clang -O0
, it compiles in seconds and then run for 4 seconds.
However, with clang -O2
, compiling takes a long time and lots of memory. On godbolt, it can be seen that, with 8
changed to smaller value, it fully expands the loop.
I'm not making it fully no optimization but to make it not recursive, just like what a nested loop should behave like. Anything I should do?
CodePudding user response:
Loop unrolling optimization can be disabled, see on godbolt. Produced code is non-recursive and expressed in terms of nested loops.
#pragma nounroll
for (int j=0; j<10; j) {
foo<i 1> (t j);
}
Also you can manually tune unrolling instead of disabling it, unrolling by 8 generates similar code to the one that is looping 8 times. (godbolt)
#pragma unroll 8
for (int j=0; j<10; j) {
foo<i 1> (t j);
}
CodePudding user response:
To make it non-recursive, you might use array as indexes:
static bool increase(std::array<int, N>& a)
{
for (auto rit = std::rbegin(a); rit != std::rend(a); rit) {
if ( *rit == 10) {
*rit = 0;
} else {
return true;
}
}
return false;
}
static void foo(xint t) {
std::array<int, N> indexes{};
do {
std::cout << std::accumulate(std::begin(indexes), std::end(indexes), 0);
} while (increase(indexes));
}
CodePudding user response:
The simplest solution is to mark the problematic function using the noinline
function attribute, which is also supported by several other C compilers (e.g. GNU g ):
template<int i=0>
static void foo(xint t) __attribute__((__noinline__)) {
This instructs the compiler's optimizer to never inline calls to that function.