Home > Net >  Pytesseract doesnt recognize simple text in image
Pytesseract doesnt recognize simple text in image

Time:09-22

I want to recognize a image like this:

enter image description here

I am using the following config:

config="--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ,."

but when I try to convert that, I get the following:

1581

1

W

I think that the image shows really clearly what is written and think that there is a problem with pytesseract. Can you help?

CodePudding user response:

Preprocessing the image to obtain a binary image before performing OCR seems to work. You could also try to resize the image so that more details would be seen

enter image description here

Results

158.1
1
IT
import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Grayscale and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY   cv2.THRESH_OTSU)[1]

# Perform text extraction
data = pytesseract.image_to_string(thresh, lang='eng', config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.waitKey()
  • Related