startTime = time.time()
blob = cv2.dnn.blobFromImage(img, float(1.0/255.0), (frameWidth,frameHeight), (0,0,0), swapRB = True, crop = False)
yolo.setInput(blob)
layerOutput = yolo.forward(outputLayers)
endTime = time.time()
Python code that I am measuring the time
auto start = chrono::steady_clock::now();
blob = blobFromImage(images[i], 1.0f/255.0f, Size(frameWidth, frameHeight), Scalar(0,0,0), true, false);
net.setInput(blob);
net.forward(layerOutput, getOutputsNames(net));
auto end = chrono::steady_clock::now();
C code that I am measuring time
In C :
blob is Mat
type, layerOutput is vector<Mat>
type, getOutputsNames
returns in vector<string>
names.
In python:
blob is numpy.ndarray
type, layerOutput
is tuple
type, outputLayers
is a list
type object.
Both backends and targets are the same and backend is opencv, target is cpu, and I am using same yolov4 weight and config files in the same directories
When measuring the time, it takes ~180-200 ms in python, yet in C it takes ~220-250 ms. Since C is a compiled language, I expect C to be work quite fast than the python, which is not the case surprisingly.
What might be the reason that python works faster than the C ? Also what are your solutions to this?
Thanks in advance!
CodePudding user response:
First thing, it's not like that. python-opencv
module is mostly written in C and C. The compiled code is then used by python using some kind of binding maybe. For example, I wrote an image registration code in C and used pybind to call the binary (.so file) from python. So if it works faster, it's not just Python.
Now come to your question. It can have multiple reasons why your python code works faster than C .
As in the comment, some people mentioned that it can be because of optimized binaries used by the Opencv module for Python.
Check the CPU utilization and number of threads while running the both code.
It can be a data structure and algorithm which is causing the speed. Example, you are using
Mat
to read the image in C while for Python you are usingNumpy
which have it's core written in C which is faster than C .
CodePudding user response:
I figured what the problem is, I have customized OpenCV for c to gain advantage of the CUDA cores in my Jetson Orin, yet the python uses general OpenCV stored in other directory, which doesn't have CUDA support. When I changed the OpenCV compilation for C to the general one, it worked fast as expected since in my customized compilation I also customized the CPU parallelization which seems to be slower than the default one.