What does the --gpu-architecture (-arch) flag of NVCC do?-CodePudding

I am a beginner at CUDA and I encountered a somewhat confusing behavior of NVCC when trying out this simple "hello world from gpu" example:

// hello_world.cu

#include <cstdio>
__global__ void hello_world() {
    int i = threadIdx.x;
    printf("hello world from thread %d\n", i);
}

int main() {
    hello_world<<<1, 10>>>();
    cudaDeviceSynchronize();
    printf("Execution ends\n");
}

If compiled with

nvcc hello_world.cu

the output is:

hello world from thread 0
hello world from thread 1
hello world from thread 2
hello world from thread 3
hello world from thread 4
hello world from thread 5
hello world from thread 6
hello world from thread 7
hello world from thread 8
hello world from thread 9
Execution ends

However, if compiled with:

nvcc hello_world.cu -arch=sm_86

Then the output is only

Execution ends

I thought -arch=sm_86 was only to specify the compute architecture, but it seems to change the behavior of the program as well. Why?

I am using RTX2060 and NVCC 11.1.

A note: This is exercise 1-4 from Professional CUDA C Programming by John Cheng et al., which asks the reader to see what happens when the program is compiled with/without the -arch flag.

CodePudding user response：

The -arch flag of NVCC controls the minimum compute capability that your program will require from the GPU in order to run properly.

As you can see here, RTX 2060 compute capabilty is 7.5 (i.e. sm_75).
This means that it will not be able to run with higher capabilty (like sm_86).

You can use -arch=sm_75 to specify this compute capability to NVCC.

You can use cudaGetLastError to check if there was an error launching the kernel.