Home > database >  handwritten circular annotation removal from scanned image
handwritten circular annotation removal from scanned image

Time:05-20

I have these images containing the handwritten circular annotation on the printed text images. I want to remove these annotations from the input image. I have tried to apply some of the thresholding methods as discussed in many threads on StackOverflow, but my results are not as I expected.

However, the method that I am using works really well if the annotation is marked by a blue pen but when the annotation is marked by a black pen then the method of thresholding and erosion won’t produce the output as expected.

Here is a sample image of my achieved results on blue annotations with the thresholding and erosion method

Image (input on the left and output on the right)

enter image description here

Code

import cv2
import numpy as np
from google.colab.patches import cv2_imshow

img = cv2.imread("/content/Scan_0101.jpg")
cv2_imshow(img)
wimg = img[:, :, 0]
ret,thresh = cv2.threshold(wimg,120,255,cv2.THRESH_BINARY)
cv2_imshow(thresh)
kernel = np.ones((3, 3), np.uint8)
erosion = cv2.erode(thresh, kernel, iterations = 1)
mask = cv2.bitwise_or(erosion, thresh)
#cv2_imshow(erosion)

white = np.ones(img.shape,np.uint8)*255
white[:, :, 0] = mask
white[:, :, 1] = mask
white[:, :, 2] = mask
result = cv2.bitwise_or(img, white)
erosion = cv2.erode(result, kernel, iterations = 1)

Here is a sample image of my achieved results on black annotations with the thresholding and erosion method

Image (input on the left and output on the right)

enter image description here

Any suggested approach for this problem? or how this code can be modified to produce the required results.

CodePudding user response:

You must understand that as the gray values in the text and those of the hand writings are in the same range, no thresholding method in the world can work.

In fact, no algorithm at all can succeed without "hints" on what characters look like or don't look like. Even the stroke thickness is not distinctive enough.

The only possible indication is that the circles are made of a smooth and long stroke. And removing them where they cross the characters is just impossible.

CodePudding user response:

Some Parts of handwritten circles (on line spacing regions) may be able to extract, with the assumption "many letters align on same line". In your image, upper and lower part of the circle will be extracted, I think.

Then, if you track the black line with starting from the extracted part (with assuming smooth curvature), it may be able to detect the connected handwritten circle.

However... in real, I think such process will encounter many difficulties : especially regarding the fact that characters will be cut off by removing curve.

  • Related