I was looking at CUDA C documentation. But there is something I didn't get about pointer typecasting. Below there are host and device code.
// Host code
int width = 64, height = 64;
float* devPtr;
size_t pitch;
cudaMallocPitch(&devPtr, &pitch,
width * sizeof(float), height);
MyKernel<<<100, 512>>>(devPtr, pitch, width, height);
// Device code
__global__ void MyKernel(float* devPtr,
size_t pitch, int width, int height)
{
for (int r = 0; r < height; r) {
float* row = (float*)((char*)devPtr r * pitch);
for (int c = 0; c < width; c) {
float element = row[c];
}
}
}
As you can see devPtr is typecasted into char. But I didn't get why typecasted into char rather than incrementing as float type.
CodePudding user response:
This is to handle a pitched allocation (the type created by cudaMallocPitch()
).
A pitched allocation "rounds up" the requested width of the allocation to a particular pitch, and this pitch is specified in bytes:
cudaMallocPitch(&devPtr, &pitch,
^
|
this value is indicated by the function as a row width or "pitch" in bytes
Because the pitch is specified in bytes, to get proper pointer arithmetic:
((char*)devPtr r * pitch);
^
|
pointer arithmetic
the pointer type must also be a byte-type. The objective of that code snippet is to increment devPtr
by a number of rows specified by r
, each row consists of pitch
bytes.
AFAIK, in CUDA, there is nothing that guarantees any particular granularity of pitch
as returned by cudaMallocPitch
. It is theoretically possible for it to be an odd number of bytes, or a prime number of bytes, for example. So playing tricks to pre-convert the pitch
value to an equivalent (pointer arithmetic) offset in other type-widths would be frowned on.