I like how GpuMat can be passed into a kernel as PtrStepSz. I want to emulate the behavior in my own custom container, But I dont understand whats actually going on. How is GpuMat able to be accessed through PtrStepSz in the kernel? And how does PtrStepSz have members like .rows and .cols if CUDA cant use classes? What actually is PtrStepSz? Ive been studying the source code but I'm having trouble contextualizing it.
CodePudding user response:
The reason that GpuMat
can be converted to PtrStepSz
is that the GpuMat
class has an overloaded typecast operator which allows extraction of core members of GpuMat
( i.e. rows
, columns
, step
and data
).
It can be seen in the GpuMat
source code linked above. The said operator is declared as a member of GpuMat
class as follows:
template <typename _Tp> operator PtrStepSz<_Tp>() const;
Coming to the second question, CUDA does allow construction of objects inside the kernel if the constructor and destructor are decorated with __device__
qualifier. So the assumption that CUDA cannot use classes is incorrect.
In the source code of opencv PtrStepSz
is defined as follows in the file cuda_types.hpp
:
template <typename T> struct PtrStepSz : public PtrStep<T>
{
__CV_CUDA_HOST_DEVICE__ PtrStepSz() : cols(0), rows(0) {}
__CV_CUDA_HOST_DEVICE__ PtrStepSz(int rows_, int cols_, T* data_, size_t step_)
: PtrStep<T>(data_, step_), cols(cols_), rows(rows_) {}
template <typename U>
explicit PtrStepSz(const PtrStepSz<U>& d) : PtrStep<T>((T*)d.data, d.step), cols(d.cols), rows(d.rows){}
int cols;
int rows;
};
It is just a soft wrapper to encapsulate image information as mentioned in the comments in cuda_types.hpp
. See how the constructor is decorated with __host__ __device__
qualifier to allow object creation on host as well as device.
// Simple lightweight structures that encapsulates information about an image on device.