Home > other >  how to extract numbers from captcha image in python?
how to extract numbers from captcha image in python?

Time:11-02

I want to extract numbers from captcha image, so I tried this code from this answer enter image description here
outPut: 436359
But, when I tried it on this image:
enter image description here
It gave me nothing, outPut: .
How can I modify my code to get the numbers as a string from the second image?

EDIT:
I tried image A image A

image B image B
How to fix that?

CodePudding user response:

Often, getting OCR just right on an image like this has to do with the order and parameters of the transformations. For example, in the following code snippet, I first convert to grayscale, then erode the pixels, then dilate, then erode again. I use threshold to convert to binary (just blacks and whites) and then dilate and erode one more time. This for me produces the correct value of 859917 and should be reproducible.

import cv2
import numpy as np
import pytesseract

file = 'sample2.jpg'
img = cv2.imread(file)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ekernel = np.ones((1,2),np.uint8)
eroded = cv2.erode(gray, ekernel, iterations = 1)
dkernel = np.ones((2,3),np.uint8)
dilated_once = cv2.dilate(eroded, dkernel, iterations = 1)
ekernel = np.ones((2,2),np.uint8)
dilated_twice = cv2.erode(dilated_once, ekernel, iterations = 1)
th, threshed = cv2.threshold(dilated_twice, 200, 255, cv2.THRESH_BINARY)
dkernel = np.ones((2,2),np.uint8)
threshed_dilated = cv2.dilate(threshed, dkernel, iterations = 1)
ekernel = np.ones((2,2),np.uint8)
threshed_eroded = cv2.erode(threshed_dilated, ekernel, iterations = 1)
text = pytesseract.image_to_string(threshed_eroded)
print(''.join(x for x in text if x.isdigit()))
  • Related