I'm working on implementing the Google Vision Detect Multiple Objects API in Python (https://cloud.google.com/vision/docs/object-localizer)
The problem I'm having is that I don't know how to use the boundingPoly nomralizedVerticies that are returned in the response to determine how to crop the original image using OpenCV.
Example Response
{
"mid": "/m/01bqk0",
"name": "Bicycle wheel",
"score": 0.9423431,
"boundingPoly": {
"normalizedVertices": [
{
"x": 0.31524897,
"y": 0.78658724
},
{
"x": 0.44186485,
"y": 0.78658724
},
{
"x": 0.44186485,
"y": 0.9692919
},
{
"x": 0.31524897,
"y": 0.9692919
}
]
}
CodePudding user response:
First convert normalized coordinates to pixel coordinates as follows:
test_coord = (0.5, 0.3)
IMAGE_SHAPE = (1920, 1080) # EXample
def to_pixel_coords(relative_coords):
return tuple(round(coord * dimension) for coord, dimension in zip(relative_coords, IMAGE_SHAPE))
After getting the pixel coordinates , lets say they are (x1,y1), (x2,y2), (x3,y3) and (x4,y4). Then you can crop the original image as follows:
top_left_x = min([x1,x2,x3,x4])
top_left_y = min([y1,y2,y3,y4])
bot_right_x = max([x1,x2,x3,x4])
bot_right_y = max([y1,y2,y3,y4])
img[top_left_y:bot_right_y 1, top_left_x:bot_right_x 1] # added 1 pixel more as last one is excluded in slicing.
CodePudding user response:
You have to unnormalize the coordinates based on the size of the original image in order to obtain the true coordinates.
(number_of_rows, number_of_columns) = image.shape[:2]
x_unormalized = round(x_normalized * number_of_rows)
y_unnormalized = round(y_normalized * number_of_columns)
...
cropped_image = image[y_unnormalized:y_unnormalized h, x_unormalized:x_unormalized w]
This is by considering that the normalized values are obtained by:
normalized_value = true_value/max(all_values)
If some other normalization is applied, then you have to apply the inverse of that particular normalization.