A computer vision model output blob is identified like this:
[image_id, label, conf, x_min, y_min, x_max, y_max], where:
image_id - ID of the image in the batch
label - predicted class ID (0 - person)
conf - confidence for the predicted class
(x_min, y_min) - coordinates of the top left bounding box corner
(x_max, y_max) - coordinates of the bottom right bounding box corner
Just manually printing the array of in the code I am working with a print(result_blob.tolist())
The output looks like this:
[0.0, 0.0, 0.6192154288291931, 0.36988523602485657, 0.39582735300064087, 0.7380738258361816, 0.9911962151527405]
If I wanted to draw a bounding box with open CV around a person in the video feed with a confidence greater than .5 how do I loop through these elements in an numpy array to do so?
This repository also contains some code and a function to do this but I dont understand how it works or if there is a better way to write it? Can someone help me understand what detections = result.reshape(-1, 7)
does and the Python enumerate
below it? the cv2 drawing rectangles I get but the function is confusing on why the convert floats to ints, etc...
def postprocess(result, image, fps=None):
"""
Define the postprocess function for output data
:param: result: the inference results
image: the orignal input frame
fps: average throughput calculated for each frame
:returns:
image: the image with bounding box and fps message
"""
detections = result.reshape(-1, 7)
for i, detection in enumerate(detections):
_, image_id, confidence, xmin, ymin, xmax, ymax = detection
if confidence > 0.5:
xmin = int(max((xmin * image.shape[1]), 10))
ymin = int(max((ymin * image.shape[0]), 10))
xmax = int(min((xmax * image.shape[1]), image.shape[1] - 10))
ymax = int(min((ymax * image.shape[0]), image.shape[0] - 10))
cv2.rectangle(image, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)
cv2.putText(image, str(round(fps, 2)) " fps", (5, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 3)
return image
CodePudding user response:
The function expects result
to be a numpy array with a certain shape.
If there are n person detections the rest of the code expects an array with shape (n, 7)
,
so an array with rows of length 7.
This is because there are 7 pieces of information per detection.
Thanks to the reshape
the function is lenient wrt the passed shape,
like (n, 7)
and (n * 7)
and (1, n, 7)
could work.
Making sure the shape is (n, 7)
allows an easy loop over the detections.
Because in general a loop over the rows of a 2d numpy array can be written as:
for row in array:
...
Using enumerate
on top of this just gives the index of the row.
However, this index isn't used in this case so the enumerate
is useless here.
Probably this loop was copied from another postprocessing function and it was left in by accident.
The unpacking of the detection has image_id
as the second element, so maybe another small error.
Since image_id
is also not used it may be nicer to write:
confidence, xmin, ymin, xmax, ymax = detection[2:]
The 4 lines that convert xmin
etc do two things:
- Convert from relative coordinates to pixel coordinates (which have to be integers)
- Make sure the rectangle isn't drawn within 10 pixels of the edge of the image