CUDA C Pointer Typecasting-CodePudding

I was looking at CUDA C documentation. But there is something I didn't get about pointer typecasting. Below there are host and device code.

// Host code
int width = 64, height = 64;
float* devPtr;
size_t pitch;
cudaMallocPitch(&devPtr, &pitch,
                width * sizeof(float), height);
MyKernel<<<100, 512>>>(devPtr, pitch, width, height);

// Device code
__global__ void MyKernel(float* devPtr,
                         size_t pitch, int width, int height)
{
    for (int r = 0; r < height;   r) {
        float* row = (float*)((char*)devPtr   r * pitch);
        for (int c = 0; c < width;   c) {
            float element = row[c];
        }
    }
}

As you can see devPtr is typecasted into char. But I didn't get why typecasted into char rather than incrementing as float type.

CodePudding user response：

This is to handle a pitched allocation (the type created by cudaMallocPitch()).

A pitched allocation "rounds up" the requested width of the allocation to a particular pitch, and this pitch is specified in bytes:

cudaMallocPitch(&devPtr, &pitch,
                          ^
                          | 
               this value is indicated by the function as a row width or "pitch" in bytes

Because the pitch is specified in bytes, to get proper pointer arithmetic:

((char*)devPtr   r * pitch);
               ^
               |
           pointer arithmetic

the pointer type must also be a byte-type. The objective of that code snippet is to increment devPtr by a number of rows specified by r, each row consists of pitch bytes.

AFAIK, in CUDA, there is nothing that guarantees any particular granularity of pitch as returned by cudaMallocPitch. It is theoretically possible for it to be an odd number of bytes, or a prime number of bytes, for example. So playing tricks to pre-convert the pitch value to an equivalent (pointer arithmetic) offset in other type-widths would be frowned on.