Home > Software engineering >  Optimization for specific argument without template
Optimization for specific argument without template

Time:05-27

I ran into some optimized code that is fast, but it makes my code ugly.

A minimal example is as follows:


enum class Foo : char {
    A = 'A',
    B = 'B'
};

struct A_t {
    constexpr operator Foo() const { return Foo::A; }
};

void function_v1(Foo s){
   if(s == Foo::A){
      //Run special version of the code
   } else {
      //Run other version of the code
   }
}

template<class foo_t>
void function_v2(foo_t s){
   if(s == Foo::A){
      //Run special version of the code
   } else {
      //Run other version of the code
   }
}

int main(){

   // Version 1 of the function, simple call, no template
   function_v1(Foo::A);

   // Version 2 of the function, templated, but call is still simple
   function_v2(Foo::A);

   // Version 2 of the function, the argument is now not of type Foo, but of type A_t
   const A_t a; 
   function_v2(a);

}


For that last function call function_v2 will be instantiated with a specific version for A_t. This may be bad for the size of the executable, but in experiments, I notice that the compiler is able to recognize that switch == Foo::A will always evaluate to true and the check is optimized away. Using gcc, This check is not optimized away in the other versions, even with -O3.

I'm working on an extremely performance intensive application, so such optimizations matter. However, I don't like the style of function_v2. To protect against calling the function with the wrong type, I would have to do something like enable_if to make sure the function isn't called with the wrong type. It complicates autocompletion because the type is now templated. And now the user needs to keep in mind to call the function using that specifically typed variable instead of the enum value.

Is there a way to write a function in the style of function_v1, but still have the compiler make different instantiations? Maybe a slightly different coding style? Or a compiler hint in the code? Or some compiler flag that will make the compiler more likely to make multiple instantiations?

CodePudding user response:

Is there a way to write a function in the style of function_v1, but still have the compiler make different instantiations?

If we expand your example a bit to better reveal the compiler's behavior:

enum class Foo : char {
    A = 'A',
    B = 'B'
};

struct A_t {
    constexpr operator Foo() const { return Foo::A; }
};

void foo();
void bar();

void function_v1(Foo s){
   if(s == Foo::A){
      foo();
   } else {
      bar();
   }
}

template<class foo_t>
void function_v2(foo_t s){
   if(s == Foo::A){
      foo();
   } else {
      bar();
   }
}

void test1(){
   function_v1(Foo::A);
}

void test2(){
   function_v2(Foo::A);
}

void test3(){
   const A_t a; 
   function_v2(a);
}

The resulting assembly for test1(), test2() and test3() are the exact same: https://gcc.godbolt.org/z/443TqcczW

So what's going on here?

The if being optimized out in function_v2() has nothing to do with it being a template, but rather the fact that it is defined in a header (which is a necessity for templates), and the full implementation is visible at call sites.

All you have to do to get the same benefits for function_v1() is to define the function in a header and mark it as inline to avoid ODR violations. You will effectively get the exact same optimizations as are happening in function_v2().

All this gives you is equivalence though. If you want guarantees, you should forcefully provide the value at compile time, as a template parameter:

template<Foo s>
void function_v3() {
    if constexpr (s == Foo::A) {
        foo();
    }
    else {
        bar();
    }
}

// usage:

function_v3<Foo::A>();

If you still need a runtime-evaluated version of the function, you could do something along these lines:

decltype(auto) function_v3(Foo s) {
    switch(s) {
        case Foo::A: 
            return function_v3<Foo::A>();
        case Foo::B: 
            return function_v3<Foo::B>();
    }
}

// Forced compile-time switch
function_v3<Foo::A>();

// At the mercy of the optimizer.
function_v3(some_val);

CodePudding user response:

How about using template specialization:

template<class T>
void function_v2_other(T s){
    //Run other version of the code
}

template<class T>
void function_v2(T s){
   function_v2_other(s);
}

template<>
void function_v2(Foo s){
   if(s == Foo::A){
      //Run special version of the code
   } else {
      function_v2_other(s);
   }
}
  • Related