JETSON ORIN: setUpNet DNN module was not built with CUDA backend; switching to CPU-CodePudding

I am working on a NVIDIA Jetson Orin to enable OpenCV with CUDA – DNN support.

My system specifications: JETSON ORIN Ubuntu 20.04 CUDA 11.4 CUDNN 8.6.0 OpenCV 4.5.4 YOLOv3

I am running as root on my Jetson Orin (Linux) or running it remotely from my Visual Studio 2019 using this code https://learnopencv.com/deep-learning-based-object-detection-using-yolov3-with-opencv-python-c/. It works with the object detection inference. But the problem is, it is switching back to CPU, not using the CUDA-GPU support. I did compile the OPENCV configuration with CUDA, CUDNN and OPENCV_DNN_CUDA and BUILD_OPENCV_DNN and other modules.

My cmake command line:

…:~$ cd tk_ws/opencv-4.5.4/build/

…/tk_ws/opencv-4.5.4/build$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D  WITH_TBB=ON -D  ENABLE_FAST_MATH=ON -D  CUDA_FAST_MATH=ON -D  WITH_CUBLAS=ON -D  WITH_CUDA=ON -D  CUDA_ARCH_BIN=8.7 -D  BUILD_opencv_cudacodec=OFF -D  WITH_CUDNN=ON -D  OPENCV_DNN_CUDA=ON -D  WITH_V4L=ON -D  WITH_QT=OFF
-D  WITH_OPENGL=ON -D  WITH_GSTREAMER=ON -D  OPENCV_GENERATE_PKGCONFIG=ON -D OPENCV_PC_FILE_NAME=opencv4.pc -D  OPENCV_ENABLE_NONFREE=ON -D OPENCV_PYTHON3_INSTALL_PATH=/usr/lib/python3.8/dist-packages -D OPENCV_EXTRA_MODULES_PATH=/home/…/tk_ws/opencv_contrib-4.5.4/modules
-D  INSTALL_PYTHON_EXAMPLES=OFF -D  INSTALL_C_EXAMPLES=OFF -D  BUILD_EXAMPLES=OFF -D  BUILD_OPENCV_DNN=ON -D  BUILD_OPENCV_WORLD=ON
-D CUDNN_VERSION=8.6.0 -D CUDNN_LIBRARY=/usr/lib/aarch64-linux-gnu/libcudnn.so -D CUDNN_INCLUDE_DIR=/usr/include -D PYTHON3_EXECUTABLE=/usr/bin/python3
-D PYTHON3_INCLUDE_DIR=/usr/include/python3.8 -D PYTHON3_LIBRARY=/usr/lib/aarch64-linux-gnu/libpython3.8.so -D PYTHON3_NUMPY_INCLUDE_DIRS=/usr/lib/python3/dist-packages/numpy/core/include
-D  PYTHON3_PACKAGES_PATH=/usr/lib/python3/dist-packages -D OpenCV_DIR=/home/…/tk_ws/opencv-4.5.4/build ..

In my configuration,

**1. General configuration for OpenCV 4.5.4 Version control: unknown

NVIDIA CUDA: YES (ver 11.4, CUFFT CUBLAS FAST_MATH)

NVIDIA GPU arch: 87

NVIDIA PTX archs:

cuDNN: YES (ver 8.6.0)**

and the

net.setPreferableBackend(DNN_BACKEND_OPENCV); net.setPreferableTarget(DNN_TARGET_CPU);

throw the error:

setUpNet DNN module was not built with CUDA backend; switching to CPU

Also, the test 1 code (test.cpp) test.cpp

     $  g   test.cpp `pkg-config opencv4 --cflags --libs` -o test
     $ ./test
     (opencv_dnn_cuda) ………:$ cd workspace/opencv-4.5.4
     (opencv_dnn_cuda) ……….:/workspace/opencv-4.5.4$ g   test.cpp pkg-config opencv4 –cflags –libs -o test
     (opencv_dnn_cuda) ………..:~/workspace/opencv-4.5.4$ ./test
    Error: OpenCV(4.5.4) /home/ubuntu/build_opencv/opencv/modules/core/include/opencv2/core/private.cuda.hpp:106: error: (-216:No CUDA support) The library is compiled without CUDA support in function ‘throw_no_cuda’
    
    0.000296

For another test 2 in python test.py

the result is as below: My results are

`in the virtual environment
lenovo: ~ /workspace/opencv-4.5.4$ cd build/ workon opencv_dnn_cuda
 (opencv_dnn_cuda) lenovo:~ /workspace/opencv-4.5.4/build$ python3 test.py
CUDA using GPU — 0.76786208152771 seconds —
CPU — 1.6167230606079102 seconds —`

`without virtual environment
lenovo:$ cd workspace/opencv-4.5.4/build
lenovo:/workspace/opencv-4.5.4/build$ python3 test.py
CUDA using GPU — 1.4443564414978027 seconds —
CPU — 3.2225732803344727 seconds —`

The weird part in the first two situations is: [ WARN:0] global /home/ubuntu/build_opencv/opencv/modules/dnn/src/dnn.cpp (1447) setUpNet DNN module was not built with CUDA backend; switching to CPU

As I mentioned above, the same weird issue repeats here too. I have no path /home/ubuntu/build_opencv/opencv/

Instead my actual path is /home/lenovo/workspace/opencv-4.5.4/modules/dnn/src/dnn.cpp

$ python3

Python 3.8.10 (default, Jun 22 2022, [20:18:18] [GCC 9.4.0] on linux Type “help”, “copyright”, “credits” or “license” for more information.
```
 import cv2

 cv2.version

 ‘4.5.4’

 exit()
```

pkg-config --modversion opencv4
4.5.4

Requesting for your kind help.

Thank you.

In addition,

As suggested by @Micka,

I have enabled CUDA_ARCH_PTX=“” and redid the make, sudo make install steps for opencv installation. Then tried a sample.cpp code for checking out the CUDA enabled test.

1. sample.cpp

#include <iostream>
#include "opencv2/opencv.hpp"
#include "opencv2/core/cuda.hpp"
#include "opencv2/cudaarithm.hpp"
#include "opencv2/cudaoptflow.hpp"
#include <opencv2/core/utility.hpp>
#include "opencv2/core.hpp"

int main (int argc, char* argv[])
{
    try
    {
        cv::cuda::GpuMat src_host = cv::imread("/home/...../test1.png", CV_LOAD_IMAGE_GRAYSCALE);
        cv::cuda::GpuMat dst, src;
        src.upload(src_host);

        cv::cuda::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY);

        cv::cuda::GpuMat result_host;
        dst.download(result_host);

        cv::imshow("Result", result_host);
        cv::waitKey();
    }
    catch(const cv::Exception& ex)
    {
        std::cout << "Error: " << ex.what() << std::endl;
    }
    return 0;
}

Result:

g   sample.cpp `pkg-config opencv4 --cflags --libs` -o sample

sample.cpp: In function ‘int main(int, char**)’: sample.cpp:13:47: error: conversion from ‘cv::Mat’ to non-scalar type ‘cv::cuda::GpuMat’ requested 13 | cv::cuda::GpuMat src_host = cv::imread("/home/sesotec-ai-2/darknet/test1.png", cv::IMREAD_GRAYSCALE); | ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sample.cpp:17:53: error: ‘CV_THRESH_BINARY’ was not declared in this scope 17 | cv::cuda::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY); | ^~~~~~~~~~~~~~~~

Test 2: test.cpp

#include <iostream>
#include <ctime>
#include <cmath>
#include "bits/time.h"

#include <opencv2/core.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>

#include <opencv2/core/cuda.hpp>
#include <opencv2/cudaarithm.hpp>
#include <opencv2/cudaimgproc.hpp>
#include "opencv2/core/cuda.hpp"
#include "opencv2/cudaarithm.hpp"
#include "opencv2/cudaoptflow.hpp"
#include <opencv2/core/utility.hpp>
#include "opencv2/core.hpp"
#include <opencv2/world.hpp>

#define TestCUDA true
using namespace cv;
using namespace cuda;

int main() {
    std::clock_t begin = std::clock();

        try {
            cv::String filename = "/home/.../.../test1.png";
            cv::Mat srcHost = cv::imread(filename, cv::IMREAD_GRAYSCALE);

            for(int i=0; i<1000; i  ) {
                if(TestCUDA) {
                    cv::cuda::GpuMat dst, src;
                    src.upload(srcHost);

                    //cv::cuda::threshold(src,dst,128.0,255.0, CV_THRESH_BINARY);
                    cv::cuda::bilateralFilter(src,dst,3,1,1);

                    cv::Mat resultHost;
                    dst.download(resultHost);
                } else {
                    cv::Mat dst;
                    cv::bilateralFilter(srcHost,dst,3,1,1);
                }
            }

            //cv::imshow("Result",resultHost);
            //cv::waitKey();

        } catch(const cv::Exception& ex) {
            std::cout << "Error: " << ex.what() << std::endl;
        }

    std::clock_t end = std::clock();
    std::cout << double(end-begin) / CLOCKS_PER_SEC  << std::endl;
}

Result:

 $ g   test.cpp `pkg-config opencv4 --cflags --libs` -o test  
 $ ./test

Error: OpenCV(4.5.4) /home/ubuntu/build_opencv/opencv/modules/core/include/opencv2/core/private.cuda.hpp:106: error: (-216:No CUDA support) The library is compiled without CUDA support in function 'throw_no_cuda'

0.047034

Build Information in the test.py

import cv2 as cv; 
print(cv.getBuildInformation())

                                    

> $ python3 test.py
> 
> CUDA using GPU --- 1.428098201751709 seconds --- CPU ---
> 3.2391397953033447 seconds ---
> 
> General configuration for OpenCV 4.5.4
> =====================================   Version control:               unknown
> 
>   Extra modules:
>     Location (extra):            /home/.../tk_ws/opencv_contrib-4.5.4/modules
>     Version control (extra):     unknown
> 
>   Platform:
>     Timestamp:                   2022-11-15T13:55:20Z
>     Host:                        Linux 5.10.65-tegra aarch64
>     CMake:                       3.16.3
>     CMake generator:             Unix Makefiles
>     CMake build tool:            /usr/bin/make
>     Configuration:               Release
> 
>   CPU/HW features:
>     Baseline:                    NEON FP16
> 
>   C/C  :
>     Built as dynamic libs?:      YES
>     C   standard:                11
>     C   Compiler:                /usr/bin/c    (ver 9.4.0)
>     C   flags (Release):         -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -fopenmp -O3 -DNDEBUG  -DNDEBUG
>     C   flags (Debug):           -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -fopenmp -g  -O0 -DDEBUG -D_DEBUG
>     C Compiler:                  /usr/bin/cc
>     C flags (Release):           -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fopenmp -O3 -DNDEBUG  -DNDEBUG
>     C flags (Debug):             -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fopenmp -g  -O0 -DDEBUG -D_DEBUG
>     Linker flags (Release):      -Wl,--gc-sections -Wl,--as-needed
>     Linker flags (Debug):        -Wl,--gc-sections -Wl,--as-needed
>     ccache:                      NO
>     Precompiled headers:         NO
>     Extra dependencies:          m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps
> cublas cudnn cufft -L/usr/local/cuda-11.4/lib64
> -L/usr/lib/aarch64-linux-gnu
>     3rdparty dependencies:
> 
>   OpenCV modules:
>     To be built:                 aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d
> cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow
> cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres
> dpm face features2d flann freetype fuzzy gapi hfs highgui img_hash
> imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect
> optflow phase_unwrapping photo plot python2 python3 quality rapid reg
> rgbd saliency shape stereo stitching structured_light superres
> surface_matching text tracking ts video videoio videostab
> wechat_qrcode world xfeatures2d ximgproc xobjdetect xphoto
>     Disabled:                    -
>     Disabled by dependency:      -
>     Unavailable:                 alphamat cvv hdf java julia matlab ovis sfm viz
>     Applications:                tests perf_tests examples apps
>     Documentation:               NO
>     Non-free algorithms:         YES
> 
>   GUI:
>     GTK :                        YES (ver 3.24.20)
>       GThread :                  YES (ver 2.64.6)
>       GtkGlExt:                  NO
>     VTK support:                 NO
> 
>   Media I/O:
>     ZLib:                        /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.11)
>     JPEG:                        /usr/lib/aarch64-linux-gnu/libjpeg.so (ver 80)
>     WEBP:                        build (ver encoder: 0x020f)
>     PNG:                         /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.6.37)
>     TIFF:                        /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 / 4.1.0)
>     JPEG 2000:                   build (ver 2.4.0)
>     OpenEXR:                     /usr/lib/aarch64-linux-gnu/libImath.so
> /usr/lib/aarch64-linux-gnu/libIlmImf.so
> /usr/lib/aarch64-linux-gnu/libIex.so
> /usr/lib/aarch64-linux-gnu/libHalf.so
> /usr/lib/aarch64-linux-gnu/libIlmThread.so (ver 2_3)
>     HDR:                         YES
>     SUNRASTER:                   YES
>     PXM:                         YES
>     PFM:                         YES
> 
>   Video I/O:
>     DC1394:                      YES (2.2.5)
>     FFMPEG:                      YES
>       avcodec:                   YES (58.54.100)
>       avformat:                  YES (58.29.100)
>       avutil:                    YES (56.31.100)
>       swscale:                   YES (5.5.100)
>       avresample:                YES (4.0.0)
>     GStreamer:                   YES (1.16.3)
>     v4l/v4l2:                    YES (linux/videodev2.h)
> 
>   Parallel framework:            OpenMP
> 
>   Trace:                         YES (with Intel ITT)
> 
>   Other third-party libraries:
>     Lapack:                      NO
>     Eigen:                       NO
>     Custom HAL:                  YES (carotene (ver 0.0.1))
>     Protobuf:                    build (3.5.1)
> 
>   NVIDIA CUDA:                   YES (ver 11.4, CUFFT CUBLAS
> FAST_MATH)
>     NVIDIA GPU arch:             87
>     NVIDIA PTX archs:
> 
>   cuDNN:                         YES (ver 8.6.0)
> 
>   OpenCL:                        YES (no extra features)
>     Include path:                /home/../tk_ws/opencv-4.5.4/3rdparty/include/opencl/1.2
>     Link libraries:              Dynamic load
> 
>   Python 2:
>     Interpreter:                 /usr/bin/python2.7 (ver 2.7.18)
>     Libraries:                   /usr/lib/aarch64-linux-gnu/libpython2.7.so (ver 2.7.18)
>     numpy:                       /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.16.5)
>     install path:                lib/python2.7/dist-packages/cv2/python-2.7
> 
>   Python 3:
>     Interpreter:                 /usr/bin/python3 (ver 3.8.10)
>     Libraries:                   /usr/lib/aarch64-linux-gnu/libpython3.8.so (ver 3.8.10)
>     numpy:                       /usr/lib/python3/dist-packages/numpy/core/include (ver 1.17.4)
>     install path:                /usr/lib/python3/dist-packages/cv2/python-3.8
> 
>   Python (for build):            /usr/bin/python3
> 
>   Java:
>     ant:                         NO
>     JNI:                         NO
>     Java wrappers:               NO
>     Java tests:                  NO
> 
>   Install to:                    /usr/local
> -----------------------------------------------------------------

Device Info*

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Orin"
CUDA Driver Version / Runtime Version          11.4 / 11.4
CUDA Capability Major/Minor version number:    8.7
Total amount of global memory:                 30623 MBytes (32110186496 bytes)
(016) Multiprocessors, (128) CUDA Cores/MP:    2048 CUDA Cores
GPU Max Clock rate:                            1300 MHz (1.30 GHz)
  Memory Clock rate:                             1300 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        167936 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_May__4_00:02:26_PDT_2022 Cuda compilation tools, release 11.4, V11.4.239 Build cuda_11.4.r11.4/compiler.31294910_0

CodePudding user response：

It now worked.

After adding libopencv_world.so when the program is run helped.

Thank you all for the suggestions