extracting coordinates from computer vision inference-CodePudding

I converted this computer vision model 7.x to an ONNX type model that can be used with the open VINO toolkit. This model has good characteristics of what I am after for how it is used in other applications I have read about.

I think my question is super basic related to not understanding computer vision enough and just curious if someone can give me some tips on the computer vision basics on how to loop through the model output for "bounding boxes" to draw with opencv.

Using this on CPU with pip installed open VINO:

import cv2
import numpy as np
import matplotlib.pyplot as plt
from openvino.runtime import Core

model_path = (
    f"./yolov7.xml"
)

ie_core = Core()

def model_init(model_path):
    model = ie_core.read_model(model=model_path)
    compiled_model = ie_core.compile_model(model=model, device_name="CPU")
    input_keys = compiled_model.input(0)
    output_keys = compiled_model.output(0)
    return input_keys, output_keys, compiled_model

input_key, output_keys, compiled_model = model_init(model_path)

# resize the image so it works with the model dimensions
image = cv2.resize(image, (width, height))
image = image.transpose((2,0,1))
image = image.reshape(1,3, height,width)

# Run inference on image, trying .output(1) first
boxes = compiled_model([image])[compiled_model.output(1)]

The code works....outputs an array, but what does this data contain? For some reason I thought that there could be a confidence I could filter out bad predictions as well as bounding box coordinates?

If I print(compiled_model) this outputs I think the model architecture:

<CompiledModel:
inputs[
<ConstOutput: names[input.1] shape{1,3,640,640} type: f32>
]
outputs[
<ConstOutput: names[812] shape{1,25200,85} type: f32>,
<ConstOutput: names[588] shape{1,3,80,80,85} type: f32>,
<ConstOutput: names[669] shape{1,3,40,40,85} type: f32>,
<ConstOutput: names[750] shape{1,3,20,20,85} type: f32>
]>

Does this tell me anything about the model output, like what the data would contain? or the boxes.shape:

Which returns: (1, 3, 80, 80, 85)

for box in boxes:
    print(box)

this is just numpy arrays lots of float data just curious if anyone can help me understand at a high level what I need to learn to draw bounding boxes around features inside the image.

CodePudding user response：

From my replication, your code is not working with "NameError:name 'image' is not defined" error. In your output, the ConstOutput only represents port/node of your model. To ensure your model works, run your yolov7.xml file with OpenVINO Benchmark Python Tool. You should not receive any errors.

In OpenVINO samples, you may refer to Object Detection Python Demo source code to learn the OpenVINO Inference Engine API usage for creating bounding boxes and how to handle the model. Here is another example of creating bounding boxes:

For box in boxes:   
  #Pick a confidence factor from the last place in an array.    
   conf=box[-1]   
   If conf > threshold:  
     #Convert float to int and multiply corner position of each box by x and y ration.    
     #If the bounding box is found that the top of the image  
     #Position the upper box bar little lower to make it visible on the image    
    (x_min, y_min, x_max, y_max) = [
        int (max(corner_position*ratio_y, 10)) if idx%2
        else int (corner_position*ratio_x)
        for idx, corner_position in enumerate(box[:-1])  

    #Draw a box base on the position, parameters in rectangle function are: image,start_point, end_point, color, thickness.
   rgb_image = cv2.rectangle(rgb_image, (x_min,y_min), (x_max,y_max), 
       colors["green"], 3)