I'm using the tesseract R package to recognize text within an image file. However, when plotting the bounding box for a word, the coordinates don't seem to be right.
- Why is the bounding box for the word "This" not aligned with the text "This" in the image?
- Is there an easier way to plot all bounding box rectangles on the image?
library(tesseract)
library(magick)
library(tidyverse)
text <- tesseract::ocr_data("http://jeroen.github.io/images/testocr.png")
image <- image_read("http://jeroen.github.io/images/testocr.png")
text <- text %>%
separate(bbox, c("x1", "y1", "x2", "y2"), ",") %>%
mutate(
x1 = as.numeric(x1),
y1 = as.numeric(y1),
x2 = as.numeric(x2),
y2 = as.numeric(y2)
)
plot(image)
rect(
xleft = text$x1[1],
ybottom = text$y1[1],
xright = text$x2[1],
ytop = text$y2[1])
CodePudding user response:
This is simply because the x, y co-ordinates of images are counted from the top left, whereas rect
counts from the bottom left. The image is 480 pixels tall, so we can do:
plot(image)
rect(
xleft = text$x1[1],
ybottom = 480 - text$y1[1],
xright = text$x2[1],
ytop = 480 - text$y2[1])
Or, to show this generalizes:
plot(image)
rect(
xleft = text$x1,
ybottom = magick::image_info(image)$height - text$y1,
xright = text$x2,
ytop = magick::image_info(image)$height - text$y2,
border = sample(128, nrow(text)))