I want to recognize a image like this:
I am using the following config:
config="--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ,."
but when I try to convert that, I get the following:
1581
1
W
I think that the image shows really clearly what is written and think that there is a problem with pytesseract. Can you help?
CodePudding user response:
Preprocessing the image to obtain a binary image before performing OCR seems to work. You could also try to resize the image so that more details would be seen
Results
158.1
1
IT
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Grayscale and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY cv2.THRESH_OTSU)[1]
# Perform text extraction
data = pytesseract.image_to_string(thresh, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.waitKey()