I am trying to allocate several SVM buffers and pass them to an OpenCL kernel using the following method. The kernel is run on Intel HD Graphics 530 and NVIDIA GTX 950M. I get different results on these GPUs, and I am not sure which behavior is correct (maybe, both?)
- Initialize OpenCL.
- Allocate 3 buffers (
data0
,data1
,pointers
) with 3 clSVMAlloc() calls. - Check the results for NULL (no failure here).
- Map all the buffers using blocking clEnqueueSVMMap() calls check for CL_SUCCESS (no failure here).
- Fill
data0
anddata1
with data, andpointers
with pointers todata0
anddata1
, so thatpointers[0] == data0
andpointers[1] == data1
. - Unmap all the buffers using clEnqueueSVMUnmap() check for CL_SUCCESS (no failure here).
- Set the 1st kernel argument using clSetKernelArgSVMPointer(...,
data0
) check for CL_SUCCESS (no failure here). - Set the 2nd kernel argument using clSetKernelArgSVMPointer(...,
pointers
) check for CL_SUCCESS (no failure here). - Run the kernel and test results.
On NVIDIA card, the data is available through both pointers[0]
and pointers[1]
. This is the expected behavior.
On Intel chip, memory behind pointers[0]
is available (because data0
has been set as kernel argument), but memory behind pointers[1]
is all zeroes. If I set data1
as kernel argument, then it becomes available through pointers[1]
and memory behind pointers[0]
becomes zeroes.
The question is: is Intel approach a bug or a feature? I have not found any related information in the OpenCL 3.0 specification.
CodePudding user response:
Intel works correctly and adheres to the specification. NVIDIA is more fool-proof, however.
To fix the problem in question, one needs to not forget to pass an array of SVM buffers to a kernel:
ret = clSetKernelExecInfo(kernel, CL_KERNEL_EXEC_INFO_SVM_PTRS, sizeof(svm_buffers), svm_buffers);
where svm_buffers
is void*[]
array of pointers to buffers allocated using clSVMAlloc() calls.
The OpenCL specification states the following for clSetKernelExecInfo
with parameter CL_KERNEL_EXEC_INFO_SVM_PTRS
:
SVM pointers must reference locations contained entirely within buffers that are passed to kernel as arguments, or that are passed through the execution information. Non-argument SVM buffers must be specified by passing pointers to those buffers via clSetKernelExecInfo for coarse-grain and fine- grain buffer SVM allocations but not for finegrain system SVM allocations.