How to binarize an image when image has white text on black background and vice versa?-CodePudding

I want to binarize an image for OCR. I have attached the code which take image data as input and return binary image and this method works for most of the image.

For e.g,

Original:

Original Image Sample

Result:

Binarized Image of Sample

def preprocessing(image):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blured1 = cv2.medianBlur(image, 3)
    blured2 = cv2.medianBlur(image, 51)
    divided = np.ma.divide(blured1, blured2).data
    normed = np.uint8(255 * divided / divided.max())
    th, image = cv2.threshold(normed, 100, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
    image = cv2.erode(image, np.ones((3, 3), np.uint8))
    image = cv2.dilate(image, np.ones((3, 3), np.uint8))
    return image

But when I applied the same method on below attached images it won't work as per the expectation. It should give image which has readable text for tesseract input.

Original Image 1:

Original Image 1

Pre processed image:

Pre processed image

Original Image 2:

Original Image 2

Pre processed image:

Pre processed image

CodePudding user response：

You should probably try to disassemble the image yourself. I think the Bradley-Roth algorithm (Bradley-Roth Adaptive Thresholding Algorithm - How do I get better performance?) could help you with a slight modification - if the neighborhood is brighter than 128, then what is darker is highlighted, if the neighborhood is darker than 128, then what is lighter is highlighted.