I am new to cuda and still trying to figure things out, so this question maybe dumb but I can't seem to figure out the problem so bare with me.
I am trying to copy a 2d array to the GPU. The size of the array is N*N (square array). I'm trying to copy it using MallocPitch()
& cudaMemcpy2D()
. The problem is I seem to only be copying the first row of the array and nothing else. I can't find what exactly im doing wrong.
My code:
void function(){
double A[N][N];
//code to fill out the array.
double* d_A;
size_t pitch;
cudaMallocPitch(&d_A, &pitch, N * sizeof(double), N);
cudaMemcpy2D(d_A, pitch, A, N * sizeof(double) , N * sizeof(double), N, cudaMemcpyHostToDevice);
int threadnum = 1;
int blocksnum = 1;
kernal_print<<<blocknum, threadnum>>>(d_A, N);
//copying back to host & freeing up memory
}
__global__ void kernal_print(double* d_A, N){
int xIdx = threadIdx.x blockDim.x * blockIdx.x;
int yIdx = threadIdx.y blockDim.y * blockIdx.y;
printf("\n");
for(int i = 0; i < N*N; i ){
printf("%f, ",d_A[i]);
}
printf("\n");
}
The code above will only print the first row of whatever matrix I have. So for example a 3x3 matrix that looks like this:
1 2 3
4 5 6
7 8 9
the code will print (1 2 3 0 0 0 0 0 0)
Any idea of what Iam doing wrong? Thanks in advance!
CodePudding user response:
This question may be useful for background.
Perhaps you don't know what a pitched allocation is. A pitched allocation looks like this:
X X X P P P
X X X P P P
X X X P P P
The above could represent storage for a 3x3 array (elements represented by X
) that is pitched (pitched value of 6 elements, pitch "elements" represented by P
).
You'll have no luck accessing such a storage arrangement if you don't follow the guidelines given in the reference manual for cudaMallocPitch
. In-kernel access to such a pitched allocation should be done as follows:
T* pElement = (T*)((char*)BaseAddress Row * pitch) Column;
You'll note that the above formula depends on the pitch
value that was provided at the point of cudaMallocPitch
. If you don't pass that value to your kernel, you won't have any luck with this.
Because you are not doing that, the proximal reason for your observation:
the code will print (1 2 3 0 0 0 0 0 0)
is because your indexing is reading just the first "row" of that pitched allocation, and the P
elements are showing up as zero (although that's not guaranteed.)
We can fix your code simply by implementing the suggestions given in the reference manual:
$ cat t2153.cu
#include <cstdio>
const size_t N = 3;
__global__ void kernal_print(double* d_A, size_t my_N, size_t pitch){
// int xIdx = threadIdx.x blockDim.x * blockIdx.x;
// int yIdx = threadIdx.y blockDim.y * blockIdx.y;
printf("\n");
for(int row = 0; row < my_N; row )
for (int col = 0; col < my_N; col ){
double* pElement = (double *)((char*)d_A row * pitch) col;
printf("%f, ",*pElement);
}
printf("\n");
}
void function(){
double A[N][N];
for (size_t row = 0; row < N; row )
for (size_t col = 0; col < N; col )
A[row][col] = row*N col 1;
double* d_A;
size_t pitch;
cudaMallocPitch(&d_A, &pitch, N * sizeof(double), N);
cudaMemcpy2D(d_A, pitch, A, N * sizeof(double) , N * sizeof(double), N, cudaMemcpyHostToDevice);
int threadnum = 1;
int blocknum = 1;
kernal_print<<<blocknum, threadnum>>>(d_A, N, pitch);
cudaDeviceSynchronize();
}
int main(){
function();
}
$ nvcc -o t2153 t2153.cu
$ compute-sanitizer ./t2153
========= COMPUTE-SANITIZER
1.000000, 2.000000, 3.000000, 4.000000, 5.000000, 6.000000, 7.000000, 8.000000, 9.000000,
========= ERROR SUMMARY: 0 errors
$
A few comments:
- The usage of the term 2D can have varied interpretations.
- Using a pitched allocation is not necessary for 2D work, and it may also have no practical value (not making your code simpler or more performant).
- For further discussion of the varied ways of doing "2D work", please read the answer I linked.
- This sort of allocation:
double A[N][N];
may give you trouble for largeN
, because it is a stack-based allocation. Instead, use a dynamic allocation (which may affect a number of the methods you use to handle it.) There are various questions covering this, such as this one.