How to Crop Image Based on Google Vision API Bounding Poly Normalized Vertices using OpenCV for Pyth-CodePudding

I'm working on implementing the Google Vision Detect Multiple Objects API in Python (https://cloud.google.com/vision/docs/object-localizer)

The problem I'm having is that I don't know how to use the boundingPoly nomralizedVerticies that are returned in the response to determine how to crop the original image using OpenCV.

Example Response

{
          "mid": "/m/01bqk0",
          "name": "Bicycle wheel",
          "score": 0.9423431,
          "boundingPoly": {
            "normalizedVertices": [
              {
                "x": 0.31524897,
                "y": 0.78658724
              },
              {
                "x": 0.44186485,
                "y": 0.78658724
              },
              {
                "x": 0.44186485,
                "y": 0.9692919
              },
              {
                "x": 0.31524897,
                "y": 0.9692919
              }
            ]
          }

CodePudding user response：

First convert normalized coordinates to pixel coordinates as follows:

test_coord = (0.5, 0.3)

IMAGE_SHAPE = (1920, 1080) # EXample


def to_pixel_coords(relative_coords):
    return tuple(round(coord * dimension) for coord, dimension in zip(relative_coords, IMAGE_SHAPE))

After getting the pixel coordinates , lets say they are (x1,y1), (x2,y2), (x3,y3) and (x4,y4). Then you can crop the original image as follows:

top_left_x = min([x1,x2,x3,x4])
top_left_y = min([y1,y2,y3,y4])
bot_right_x = max([x1,x2,x3,x4])
bot_right_y = max([y1,y2,y3,y4])

img[top_left_y:bot_right_y 1, top_left_x:bot_right_x 1] # added 1 pixel more as last one is excluded in slicing.

CodePudding user response：

You have to unnormalize the coordinates based on the size of the original image in order to obtain the true coordinates.

(number_of_rows, number_of_columns) = image.shape[:2]

x_unormalized = round(x_normalized * number_of_rows)
y_unnormalized = round(y_normalized * number_of_columns)

...

cropped_image = image[y_unnormalized:y_unnormalized   h, x_unormalized:x_unormalized   w]

This is by considering that the normalized values are obtained by:

normalized_value = true_value/max(all_values)

If some other normalization is applied, then you have to apply the inverse of that particular normalization.