Example code: https://github.com/Saitama10000/Mixing-cuda-and-cpp-templates-and-lambdas
- I want to have a kernel in a
.cu
file that takes an extended__host__ __device__
lambda as parameter and use it to operate on data. - I am using a
.cuh
file to wrap the kernel execution in a wrapper function. - I include the
.cuh
file inmain.cpp
and use the wrapper function to do the computations. - I need this
.cuh, .cu
type of organizing the code - I'm using c 20
The example code doesn't compile. I am supposed to add a template instantiation in the .cu
file, but I don't know how. I've tried this:
typedef float(*op)(float);
template std::vector<float> f<op>(std::vector<float> const&, op);
but I still get this compilation error:
In file included from Mixing-cuda-and-cpp-templates-and-lambdas/main.cpp:6:
Mixing-cuda-and-cpp-templates-and-lambdas/kernel.cuh:6:20: error: ‘std::vector<float> f(const std::vector<float>&, FUNC) [with FUNC = main()::<lambda(float)>]’, declared using local type ‘main()::<lambda(float)>’, is used but never defined [-fpermissive]
6 | std::vector<float> f(std::vector<float> const& a, FUNC func);
| ^
Mixing-cuda-and-cpp-templates-and-lambdas/kernel.cuh:6:20: warning: ‘std::vector<float> f(const std::vector<float>&, FUNC) [with FUNC = main()::<lambda(float)>]’ used but never defined
make[2]: *** [CMakeFiles/main.dir/build.make:82: CMakeFiles/main.dir/main.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:95: CMakeFiles/main.dir/all] Error 2
make: *** [Makefile:103: all] Error 2
CodePudding user response:
There are two problems with your approach.
First, each lambda has it's own type even if parameters and function body are the same.
For example, the following assertion fails
#include <type_traits>
int main(){
auto lambda1 = [](){};
auto lambda2 = [](){};
static_assert(std::is_same<decltype(lambda1), decltype(lambda2)>::value, "not same");
}
That means, even if you somehow manage to explitely instantiate your template with the type of the lambda it won't be the type of the lambda which you will pass to your function. This problem can be solved by using functors instead of lambdas. Define a set of functors which may be used to call the function, and use them for template instantiation.
Second, you want to pass a __host__ __device__
function. This annotation is a CUDA C extension which cannot be compiled with a standard C compiler. You have to use a CUDA compiler instead, which in turn allows you to place your kernel and wrappers in the .cuh
file.