Cuda number of elements is larger than assigned threads-CodePudding

I am new to CUDA programming. I am curious that what happens if the number of elements is larger than the number of threads?

In this simple vector_add example

__global__
void add(int n, float *x, float *y)
{
    int i = blockIdx.x * blockDim.x   threadIdx.x;
    if (i < n) 
        y[i] = x[i]   y[i];
}

Say the number of array elements is 10,000,000. And we call this function using 64 blocks and 256 threads per block:

int n = 1e8;
int grid_size = 64;
int block_sie = 256;

Then, only 64*256 = 16384 threads are assigned, what would happen to the rest of the array elements?

CodePudding user response：

what would happen to the rest of the array elements?

Nothing at all. They wouldn't be touched and would remain unchanged. Of course, your x array elements don't change anyway. So we are referring to y here. The values of y[0..16383] would reflect the result of the vector add. The values of y[16384..9999999] would be unchanged.

For this reason (to conveniently handle arbitrary data set sizes independent of the chosen grid size), people sometimes suggest a grid-stride-loop kernel design.