I am a beginner at CUDA and I encountered a somewhat confusing behavior of NVCC when trying out this simple "hello world from gpu" example:
// hello_world.cu
#include <cstdio>
__global__ void hello_world() {
int i = threadIdx.x;
printf("hello world from thread %d\n", i);
}
int main() {
hello_world<<<1, 10>>>();
cudaDeviceSynchronize();
printf("Execution ends\n");
}
If compiled with
nvcc hello_world.cu
the output is:
hello world from thread 0
hello world from thread 1
hello world from thread 2
hello world from thread 3
hello world from thread 4
hello world from thread 5
hello world from thread 6
hello world from thread 7
hello world from thread 8
hello world from thread 9
Execution ends
However, if compiled with:
nvcc hello_world.cu -arch=sm_86
Then the output is only
Execution ends
I thought -arch=sm_86
was only to specify the compute architecture, but it seems to change the behavior of the program as well. Why?
I am using RTX2060 and NVCC 11.1.
A note: This is exercise 1-4 from Professional CUDA C Programming by John Cheng et al., which asks the reader to see what happens when the program is compiled with/without the -arch
flag.
CodePudding user response:
The -arch
flag of NVCC controls the minimum compute capability that your program will require from the GPU in order to run properly.
As you can see here, RTX 2060 compute capabilty is 7.5 (i.e. sm_75
).
This means that it will not be able to run with higher capabilty (like sm_86
).
You can use -arch=sm_75
to specify this compute capability to NVCC.
You can use cudaGetLastError
to check if there was an error launching the kernel.