I am working on a NVIDIA Jetson Orin to enable OpenCV with CUDA – DNN support.
My system specifications: JETSON ORIN Ubuntu 20.04 CUDA 11.4 CUDNN 8.6.0 OpenCV 4.5.4 YOLOv3
I am running as root on my Jetson Orin (Linux) or running it remotely from my Visual Studio 2019 using this code https://learnopencv.com/deep-learning-based-object-detection-using-yolov3-with-opencv-python-c/. It works with the object detection inference. But the problem is, it is switching back to CPU, not using the CUDA-GPU support. I did compile the OPENCV configuration with CUDA, CUDNN and OPENCV_DNN_CUDA and BUILD_OPENCV_DNN and other modules.
My cmake command line:
…:~$ cd tk_ws/opencv-4.5.4/build/
…/tk_ws/opencv-4.5.4/build$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D ENABLE_FAST_MATH=ON -D CUDA_FAST_MATH=ON -D WITH_CUBLAS=ON -D WITH_CUDA=ON -D CUDA_ARCH_BIN=8.7 -D BUILD_opencv_cudacodec=OFF -D WITH_CUDNN=ON -D OPENCV_DNN_CUDA=ON -D WITH_V4L=ON -D WITH_QT=OFF
-D WITH_OPENGL=ON -D WITH_GSTREAMER=ON -D OPENCV_GENERATE_PKGCONFIG=ON -D OPENCV_PC_FILE_NAME=opencv4.pc -D OPENCV_ENABLE_NONFREE=ON -D OPENCV_PYTHON3_INSTALL_PATH=/usr/lib/python3.8/dist-packages -D OPENCV_EXTRA_MODULES_PATH=/home/…/tk_ws/opencv_contrib-4.5.4/modules
-D INSTALL_PYTHON_EXAMPLES=OFF -D INSTALL_C_EXAMPLES=OFF -D BUILD_EXAMPLES=OFF -D BUILD_OPENCV_DNN=ON -D BUILD_OPENCV_WORLD=ON
-D CUDNN_VERSION=8.6.0 -D CUDNN_LIBRARY=/usr/lib/aarch64-linux-gnu/libcudnn.so -D CUDNN_INCLUDE_DIR=/usr/include -D PYTHON3_EXECUTABLE=/usr/bin/python3
-D PYTHON3_INCLUDE_DIR=/usr/include/python3.8 -D PYTHON3_LIBRARY=/usr/lib/aarch64-linux-gnu/libpython3.8.so -D PYTHON3_NUMPY_INCLUDE_DIRS=/usr/lib/python3/dist-packages/numpy/core/include
-D PYTHON3_PACKAGES_PATH=/usr/lib/python3/dist-packages -D OpenCV_DIR=/home/…/tk_ws/opencv-4.5.4/build ..
In my configuration,
**1. General configuration for OpenCV 4.5.4 Version control: unknown
NVIDIA CUDA: YES (ver 11.4, CUFFT CUBLAS FAST_MATH)
NVIDIA GPU arch: 87
NVIDIA PTX archs:
cuDNN: YES (ver 8.6.0)**
and the
net.setPreferableBackend(DNN_BACKEND_OPENCV); net.setPreferableTarget(DNN_TARGET_CPU);
throw the error:
setUpNet DNN module was not built with CUDA backend; switching to CPU
Also, the test 1 code (test.cpp) test.cpp
$ g test.cpp `pkg-config opencv4 --cflags --libs` -o test
$ ./test
(opencv_dnn_cuda) ………:$ cd workspace/opencv-4.5.4
(opencv_dnn_cuda) ……….:/workspace/opencv-4.5.4$ g test.cpp pkg-config opencv4 –cflags –libs -o test
(opencv_dnn_cuda) ………..:~/workspace/opencv-4.5.4$ ./test
Error: OpenCV(4.5.4) /home/ubuntu/build_opencv/opencv/modules/core/include/opencv2/core/private.cuda.hpp:106: error: (-216:No CUDA support) The library is compiled without CUDA support in function ‘throw_no_cuda’
0.000296
For another test 2 in python test.py
the result is as below: My results are
`in the virtual environment
lenovo: ~ /workspace/opencv-4.5.4$ cd build/ workon opencv_dnn_cuda
(opencv_dnn_cuda) lenovo:~ /workspace/opencv-4.5.4/build$ python3 test.py
CUDA using GPU — 0.76786208152771 seconds —
CPU — 1.6167230606079102 seconds —`
`without virtual environment
lenovo:$ cd workspace/opencv-4.5.4/build
lenovo:/workspace/opencv-4.5.4/build$ python3 test.py
CUDA using GPU — 1.4443564414978027 seconds —
CPU — 3.2225732803344727 seconds —`
The weird part in the first two situations is: [ WARN:0] global /home/ubuntu/build_opencv/opencv/modules/dnn/src/dnn.cpp (1447) setUpNet DNN module was not built with CUDA backend; switching to CPU
As I mentioned above, the same weird issue repeats here too. I have no path /home/ubuntu/build_opencv/opencv/
Instead my actual path is /home/lenovo/workspace/opencv-4.5.4/modules/dnn/src/dnn.cpp
$ python3
Python 3.8.10 (default, Jun 22 2022, [20:18:18] [GCC 9.4.0] on linux Type “help”, “copyright”, “credits” or “license” for more information.
import cv2 cv2.version ‘4.5.4’ exit()
pkg-config --modversion opencv4 4.5.4
Requesting for your kind help.
Thank you.
In addition,
As suggested by @Micka,
I have enabled CUDA_ARCH_PTX=“” and redid the make, sudo make install steps for opencv installation. Then tried a sample.cpp code for checking out the CUDA enabled test.
1. sample.cpp
#include <iostream>
#include "opencv2/opencv.hpp"
#include "opencv2/core/cuda.hpp"
#include "opencv2/cudaarithm.hpp"
#include "opencv2/cudaoptflow.hpp"
#include <opencv2/core/utility.hpp>
#include "opencv2/core.hpp"
int main (int argc, char* argv[])
{
try
{
cv::cuda::GpuMat src_host = cv::imread("/home/...../test1.png", CV_LOAD_IMAGE_GRAYSCALE);
cv::cuda::GpuMat dst, src;
src.upload(src_host);
cv::cuda::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY);
cv::cuda::GpuMat result_host;
dst.download(result_host);
cv::imshow("Result", result_host);
cv::waitKey();
}
catch(const cv::Exception& ex)
{
std::cout << "Error: " << ex.what() << std::endl;
}
return 0;
}
Result:
g sample.cpp `pkg-config opencv4 --cflags --libs` -o sample
sample.cpp: In function ‘int main(int, char**)’: sample.cpp:13:47: error: conversion from ‘cv::Mat’ to non-scalar type ‘cv::cuda::GpuMat’ requested 13 | cv::cuda::GpuMat src_host = cv::imread("/home/sesotec-ai-2/darknet/test1.png", cv::IMREAD_GRAYSCALE); | ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sample.cpp:17:53: error: ‘CV_THRESH_BINARY’ was not declared in this scope 17 | cv::cuda::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY); | ^~~~~~~~~~~~~~~~
Test 2: test.cpp
#include <iostream>
#include <ctime>
#include <cmath>
#include "bits/time.h"
#include <opencv2/core.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/core/cuda.hpp>
#include <opencv2/cudaarithm.hpp>
#include <opencv2/cudaimgproc.hpp>
#include "opencv2/core/cuda.hpp"
#include "opencv2/cudaarithm.hpp"
#include "opencv2/cudaoptflow.hpp"
#include <opencv2/core/utility.hpp>
#include "opencv2/core.hpp"
#include <opencv2/world.hpp>
#define TestCUDA true
using namespace cv;
using namespace cuda;
int main() {
std::clock_t begin = std::clock();
try {
cv::String filename = "/home/.../.../test1.png";
cv::Mat srcHost = cv::imread(filename, cv::IMREAD_GRAYSCALE);
for(int i=0; i<1000; i ) {
if(TestCUDA) {
cv::cuda::GpuMat dst, src;
src.upload(srcHost);
//cv::cuda::threshold(src,dst,128.0,255.0, CV_THRESH_BINARY);
cv::cuda::bilateralFilter(src,dst,3,1,1);
cv::Mat resultHost;
dst.download(resultHost);
} else {
cv::Mat dst;
cv::bilateralFilter(srcHost,dst,3,1,1);
}
}
//cv::imshow("Result",resultHost);
//cv::waitKey();
} catch(const cv::Exception& ex) {
std::cout << "Error: " << ex.what() << std::endl;
}
std::clock_t end = std::clock();
std::cout << double(end-begin) / CLOCKS_PER_SEC << std::endl;
}
Result:
$ g test.cpp `pkg-config opencv4 --cflags --libs` -o test
$ ./test
Error: OpenCV(4.5.4) /home/ubuntu/build_opencv/opencv/modules/core/include/opencv2/core/private.cuda.hpp:106: error: (-216:No CUDA support) The library is compiled without CUDA support in function 'throw_no_cuda'
0.047034
Build Information in the test.py
import cv2 as cv;
print(cv.getBuildInformation())
> $ python3 test.py
>
> CUDA using GPU --- 1.428098201751709 seconds --- CPU ---
> 3.2391397953033447 seconds ---
>
> General configuration for OpenCV 4.5.4
> ===================================== Version control: unknown
>
> Extra modules:
> Location (extra): /home/.../tk_ws/opencv_contrib-4.5.4/modules
> Version control (extra): unknown
>
> Platform:
> Timestamp: 2022-11-15T13:55:20Z
> Host: Linux 5.10.65-tegra aarch64
> CMake: 3.16.3
> CMake generator: Unix Makefiles
> CMake build tool: /usr/bin/make
> Configuration: Release
>
> CPU/HW features:
> Baseline: NEON FP16
>
> C/C :
> Built as dynamic libs?: YES
> C standard: 11
> C Compiler: /usr/bin/c (ver 9.4.0)
> C flags (Release): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -fopenmp -O3 -DNDEBUG -DNDEBUG
> C flags (Debug): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -fopenmp -g -O0 -DDEBUG -D_DEBUG
> C Compiler: /usr/bin/cc
> C flags (Release): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fopenmp -O3 -DNDEBUG -DNDEBUG
> C flags (Debug): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fopenmp -g -O0 -DDEBUG -D_DEBUG
> Linker flags (Release): -Wl,--gc-sections -Wl,--as-needed
> Linker flags (Debug): -Wl,--gc-sections -Wl,--as-needed
> ccache: NO
> Precompiled headers: NO
> Extra dependencies: m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps
> cublas cudnn cufft -L/usr/local/cuda-11.4/lib64
> -L/usr/lib/aarch64-linux-gnu
> 3rdparty dependencies:
>
> OpenCV modules:
> To be built: aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d
> cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow
> cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres
> dpm face features2d flann freetype fuzzy gapi hfs highgui img_hash
> imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect
> optflow phase_unwrapping photo plot python2 python3 quality rapid reg
> rgbd saliency shape stereo stitching structured_light superres
> surface_matching text tracking ts video videoio videostab
> wechat_qrcode world xfeatures2d ximgproc xobjdetect xphoto
> Disabled: -
> Disabled by dependency: -
> Unavailable: alphamat cvv hdf java julia matlab ovis sfm viz
> Applications: tests perf_tests examples apps
> Documentation: NO
> Non-free algorithms: YES
>
> GUI:
> GTK : YES (ver 3.24.20)
> GThread : YES (ver 2.64.6)
> GtkGlExt: NO
> VTK support: NO
>
> Media I/O:
> ZLib: /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.11)
> JPEG: /usr/lib/aarch64-linux-gnu/libjpeg.so (ver 80)
> WEBP: build (ver encoder: 0x020f)
> PNG: /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.6.37)
> TIFF: /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 / 4.1.0)
> JPEG 2000: build (ver 2.4.0)
> OpenEXR: /usr/lib/aarch64-linux-gnu/libImath.so
> /usr/lib/aarch64-linux-gnu/libIlmImf.so
> /usr/lib/aarch64-linux-gnu/libIex.so
> /usr/lib/aarch64-linux-gnu/libHalf.so
> /usr/lib/aarch64-linux-gnu/libIlmThread.so (ver 2_3)
> HDR: YES
> SUNRASTER: YES
> PXM: YES
> PFM: YES
>
> Video I/O:
> DC1394: YES (2.2.5)
> FFMPEG: YES
> avcodec: YES (58.54.100)
> avformat: YES (58.29.100)
> avutil: YES (56.31.100)
> swscale: YES (5.5.100)
> avresample: YES (4.0.0)
> GStreamer: YES (1.16.3)
> v4l/v4l2: YES (linux/videodev2.h)
>
> Parallel framework: OpenMP
>
> Trace: YES (with Intel ITT)
>
> Other third-party libraries:
> Lapack: NO
> Eigen: NO
> Custom HAL: YES (carotene (ver 0.0.1))
> Protobuf: build (3.5.1)
>
> NVIDIA CUDA: YES (ver 11.4, CUFFT CUBLAS
> FAST_MATH)
> NVIDIA GPU arch: 87
> NVIDIA PTX archs:
>
> cuDNN: YES (ver 8.6.0)
>
> OpenCL: YES (no extra features)
> Include path: /home/../tk_ws/opencv-4.5.4/3rdparty/include/opencl/1.2
> Link libraries: Dynamic load
>
> Python 2:
> Interpreter: /usr/bin/python2.7 (ver 2.7.18)
> Libraries: /usr/lib/aarch64-linux-gnu/libpython2.7.so (ver 2.7.18)
> numpy: /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.16.5)
> install path: lib/python2.7/dist-packages/cv2/python-2.7
>
> Python 3:
> Interpreter: /usr/bin/python3 (ver 3.8.10)
> Libraries: /usr/lib/aarch64-linux-gnu/libpython3.8.so (ver 3.8.10)
> numpy: /usr/lib/python3/dist-packages/numpy/core/include (ver 1.17.4)
> install path: /usr/lib/python3/dist-packages/cv2/python-3.8
>
> Python (for build): /usr/bin/python3
>
> Java:
> ant: NO
> JNI: NO
> Java wrappers: NO
> Java tests: NO
>
> Install to: /usr/local
> -----------------------------------------------------------------
Device Info*
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Orin"
CUDA Driver Version / Runtime Version 11.4 / 11.4
CUDA Capability Major/Minor version number: 8.7
Total amount of global memory: 30623 MBytes (32110186496 bytes)
(016) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1300 MHz (1.30 GHz)
Memory Clock rate: 1300 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 167936 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_May__4_00:02:26_PDT_2022 Cuda compilation tools, release 11.4, V11.4.239 Build cuda_11.4.r11.4/compiler.31294910_0
CodePudding user response:
It now worked.
After adding libopencv_world.so when the program is run helped.
Thank you all for the suggestions