How can I improve performance when looping through results from net.forward(outputLayers) using open-CodePudding

I'm working with Python 3.8.10, OpenCV version 4.3.0 and Cuda 10.2 on Ubuntu 20.04. I generated a weights file with Yolov3 for 23 objects that I want to detect in my images. It all works fine and I can draw beautiful boxes in Python around objects whose detection confidence lies above a certain threshold value.

However, it takes more than half a second to loop through all outputs provided by

outputs = net.forward(outputLayers)

when filtering for results above a certain confidence level.

Here's my loop:

boxes = []
confs = []
class_ids = []

for output in outputs: 
     for detect in output:
            scores = detect[5:]
            class_id = np.argmax(scores)
            conf = scores[class_id]
            if conf > 0.7:
                center_x = int(detect[0] * width)
                center_y = int(detect[1] * height)
                w = int(detect[2] * width)
                h = int(detect[3] * height)
                x = int(center_x - w/2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confs.append(float(conf))
                class_ids.append(class_id)

The reason why it takes so long is due to the size of outputs. It seems like all possible detections, no matter of confidence, are returned when calling net.forward(outputLayers). In my case, these are more than 30000 elements that I have to loop through.

Is there any way to throw out detections below a certain confidence level while the model still resides on the GPU? net.forward() doesn't seem to allow any filtering, as far as I could find out. Any ideas would be highly appreciated!

CodePudding user response：

To improve your performance you can try to detect only the 23 objects that you want with the net.forward(..) without detecting all the 80 objects that YoloV3 with coco.names detector provide.

If you want to detect only 23 specific objects with YoloV3 list there's a specific section of the darkflow repo that explains how to change the output.

note: you should retrain your model. They show this by taking up an example of 3 classes.

I believe the answer here will be more helpful but instead 1 specific class, just adjust it to 23 objects according to steps.

CodePudding user response：

I couldn't find a way to reduce the number of outputs of net.forward(), but the comment by Christoph Rackwitz provided me with a very satisfactory way of speeding up my code. Instead of looping through the output numpy array, I applied

mask = (outputs[:,5:].max(axis=1) > 0.7)
outputs = outputs[mask]

which reduced the size of my outputs from around 30000 to 33 in 3.8-06 seconds.