Any ideas on how to remove the stamp from this bill prior to OCR processing?
CodePudding user response:
Here is one way to do that in Python/OpenCV.
- Read input
- Threshold on yellow
- Dilate to fill out rectangle
- Get largest contour
- Draw a white filled contour on the input image
- Save the results
Input:
import cv2
import numpy as np
# read image
img = cv2.imread('form_with_label.jpg')
# threshold on yellow
lower=(0,200,200)
upper=(100,255,255)
thresh = cv2.inRange(img, lower, upper)
# apply dilate morphology
kernel = np.ones((9,9), np.uint8)
mask = cv2.morphologyEx(thresh, cv2.MORPH_DILATE, kernel)
# get largest contour
contours = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
big_contour = max(contours, key=cv2.contourArea)
x,y,w,h = cv2.boundingRect(big_contour)
# draw filled white contour on input
result = img.copy()
cv2.drawContours(result,[big_contour],0,(255,255,255),-1)
# save cropped image
cv2.imwrite('form_with_label_thresh.png',thresh)
cv2.imwrite('form_with_label_mask.png',mask)
cv2.imwrite('form_with_label_removed.png',result)
# show the images
cv2.imshow("THRESH", thresh)
cv2.imshow("MASK", mask)
cv2.imshow("RESULT", result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Thresholded Image:
Morphology Dilated Image:
Result:
CodePudding user response:
I have dabbled in ocr before and my best bet for you is to use color matching schemas.
- Read your image using Image package.
- convert it to RGB or RGBD format.
- Filter the colors to only show orange and brown, Not red and Yellow.
how this works is:
Each pixel in the image would be represented by Red, Green and Blue with values between 0-255 (256 values)
Use numpy to filter out values and then finally convert it to sepia or grayscale or negative and you will be able to pass it to your ocr.