cuda cudaMemcpyDeviceToHost into vector [closed]-CodePudding

I have a vector that needs data from the cuda kernel copied to it. However, I'm running into issues doing so with conflicts on pointers and types but I have no clue where to start.

std::vector<int> values;
cudaMemcpy(values.data(), values_return, size, cudaMemcpyDeviceToHost);

So I might be able to grasp some reasoning why this is not allowed but yet I have no clue on how to actually achieve what I want.

cudaMemcpy expects a pointer on the first expression. So just enter values as first value isn't working either.

when I run the make command I get: error: expression must have pointer type

CodePudding user response：

From CUDA docs -

__host__ cudaError_t cudaMemcpy ( void* dst, const void* src, size_t count, cudaMemcpyKind kind )
Copies data between host and device.

The first arg requires a void* pointer. You have a std::vector<int> values;

Thus you are passing an integer. I have no idea what values_return is. You need to ensure that values_return is a void *.

You need to change std::vector<int> values; to std::vector<void *> values;

This makes more sense -

std::vector<void *> values;
values[0] = malloc (size);
cudaMemcpy(values[0], values_return, size, cudaMemcpyDeviceToHost);


// Free the entry
free (values[0]);

CodePudding user response：

There are a few things to be aware of when copying from device into a vector.

The vector size must be large enough to store all elements which will be transfered to the host.
The first argument of cudaMemcpy specifies location where the first copied element will be stored. Consecutive elements will be stored in consecutive locations. For vector, the address of the first element is vector.data() which is equivalent to &vector[0]
The third argument of cudaMemcpy specifies the number of bytes to copy, not the number of elements. To transfer 4 ints, sizeof(int) * 4 must be used, not 4

Simple example program:

#include <vector>
#include <cassert>
#include <cstdint>

#include <thrust/fill.h>
#include <thrust/execution_policy.h>

int main(){
        constexpr std::size_t numElements = 512;
        constexpr std::size_t numBytes = sizeof(int) * numElements;

        int* d_values;
        auto status = cudaMalloc(&d_values, numBytes);
        assert(status == cudaSuccess);
        thrust::fill(thrust::device, d_values, d_values   numElements, 42);

        std::vector<int> values(numElements, 0);

        status = cudaMemcpy(values.data(), d_values, numBytes, cudaMemcpyDeviceToHost);
        assert(status == cudaSuccess);

        for(int i : values){
                assert(i == 42);
        }

        cudaFree(d_values);
}