Home > Mobile >  an illegal memory access was encountered
an illegal memory access was encountered

Time:10-01

I am a beginner at CUDA programming, writing a program composed of a single file main.cu which is shown below.

#include <iostream>
#include <opencv2/opencv.hpp>

#define DEBUG(str) std::cerr << "\033[1;37m" << __FILE__ << ":" << __LINE__ << ": \033[1;31merror:\033[0m " << str << std::endl;

#define CUDADEBUG(cudaError)      \
    if (cudaError != cudaSuccess) \
        DEBUG(cudaGetErrorString(cudaError));

#define ERROR(str)  \
    {               \
        DEBUG(str); \
        exit(1);    \
    }

__global__ void makeGrey(
    unsigned char *&pimage,
    const int &cn,
    const size_t &total)
{
    unsigned i = blockDim.x * blockIdx.x   threadIdx.x;
    unsigned icn = i * cn;

    printf("%u\n", i);

    if (i < total)
    {
        float result = pimage[icn   0] * .114  
                       pimage[icn   1] * .587  
                       pimage[icn   2] * .299;
        pimage[icn   0] = result; //B
        pimage[icn   1] = result; //G
        pimage[icn   2] = result; //R
        // pimage[icn   3] *= result; //A
    }
}

int main(int argc, char **argv)
{
    if (argc != 3)
        ERROR("usage: executable in out");

    cv::Mat image;
    unsigned char *dimage;

    image = cv::imread(argv[1], cv::IMREAD_UNCHANGED);
    if (!image.data)
        ERROR("Image null");

    if (image.empty())
        ERROR("Image empty");

    if (!image.isContinuous())
        ERROR("image is not continuous");

    const size_t N = image.total();
    const int cn = image.channels();
    const size_t numOfElems = cn * N;
    const int blockSize = 512;
    const int gridSize = (N - 1) / blockSize   1;

    CUDADEBUG(cudaMalloc(&dimage, numOfElems * sizeof(unsigned char)));
    CUDADEBUG(cudaMemcpy(dimage, image.data, numOfElems * sizeof(unsigned char), cudaMemcpyHostToDevice));

    makeGrey<<<gridSize, blockSize>>>(dimage, cn, N);
    cudaError_t errSync = cudaGetLastError();
    cudaError_t errAsync = cudaDeviceSynchronize();
    if (errSync != cudaSuccess)
        std::cerr << "Sync kernel error: " << cudaGetErrorString(errSync) << std::endl;
    if (errAsync != cudaSuccess)
        std::cerr << "Async kernel error: " << cudaGetErrorString(errAsync) << std::endl;

    CUDADEBUG(cudaMemcpy(image.data, dimage, numOfElems * sizeof(unsigned char), cudaMemcpyDeviceToHost)); //line 73
    CUDADEBUG(cudaFree(dimage));                                                                           //line 74

    cv::imwrite(argv[2], image);
    return 0;
}

When I execute the program, I get

Async kernel error: an illegal memory access was encountered
/path-to-main.cu:73: error: an illegal memory access was encountered
/path-to-main.cu:74: error: an illegal memory access was encountered

I checked CV_VERSION macro which is 4.5.3-dev, and Cuda Toolkit 11.4 is installed (nvcc version 11.4). Also afaik, the kernel does not execute at all (I used Nsight gdb debugger and printf). I could not understand why I am accessing an illegal memory area. I appreciate any help. Thank you in advance.

CodePudding user response:

For comers in the future, as Kaldrr pointed out I should remove references and pass arguments by value. I thought using references would work because cudaMalloc supports references. Briefly, changing the parameters in the definition of the kernel function as below worked

__global__ void makeGrey(
    unsigned char *&pimage,
    const int &cn,
    const size_t &total)

NOTE: If Kaldrr posts his answer to this question, I will accept it since the original answer belongs to him.

CodePudding user response:

As mentioned in a comment, your GPU function takes arguments by references.

__global__ void makeGrey(
    unsigned char *&pimage,
    const int &cn,
    const size_t &total)

This is bad, passing a reference to a function means more or less that you're passing an address where you can find the value, not the value itself. In your situation those values are in memory used by Host, NOT Device/GPU memory, when GPU tries to access those values it will most likely crash.

The types you are trying to pass, unsigned char*, int and size_t are very cheap to copy, there's no need to pass them by reference in the 1st place.

__global__ void makeGrey(
    unsigned char *pimage,
    const int cn,
    const size_t total)

There are tools provided by nvidia to debug CUDA applications, but I'm not really familiar with them, you can also use printf inside GPU functions, but you will have to organize output from potentially thousand of threads.

In general, whenever you call GPU functions, be very cautious about what you're passing as parameters, as they need to be passed from Host memory to Device memory. Usually you want to pass everything by value, any pointers need to point to Device memory, and watch out from references.

  • Related