Home > Back-end >  Convert colored scanned image to black-white image
Convert colored scanned image to black-white image

Time:10-28

I have scanned images like this where the background color is not necessarily consistent. When I use an ImageMagick command like this, it will apply a fixed threshold, which is not good for images without a consistent background.

convert in.jpg -threshold 35% -type bilevel -monochrome -compress LZW out.pdf

Can anybody provide a robust way to generate the corresponding monochrome image maintaining all the texts?

I think the best method probably should be based on deep learning. But DL may take too many resources to run. Non-DL methods are also welcome if it can render reasonably good results.

enter image description here

CodePudding user response:

You can improve that using -lat function in Imagemagick as follows:

Input:

enter image description here

convert coahuila.jpg -colorspace gray -negate -lat 50x50 10% -negate result.jpg

enter image description here

Note: I suggest saving as PNG or TIFF to avoid extra JPG compression. I only saved as JPG since this forum has size restrictions.

CodePudding user response:

You obtain a good result with adaptive thresholding (for instance size 43, offset 28 but this setting is not critical).

enter image description here

CodePudding user response:

Here is another Imagemagick 6 approach using division normalization and thresholding. Then some morphology to connect the rows of text. Then followed by connected components processing to find the largest black connected region. From that we extract the bounding box and use its dimension to crop the normalized image and thresholded image and pad back out with white to the size of the original image.

Input:

enter image description here

dims=$(convert coahuila.jpg -format "%wx%h" info:)
bbox=$(convert coahuila.jpg \
-colorspace gray \
\(  clone -blur 0x20 \) \
 swap \
-compose divide -composite \
-threshold 70% \
-type bilevel \
 write coahuila_div.png \
-morphology open rectangle:1x55 \
-morphology close rectangle:3x1 \
-define connected-components:verbose=true \
-define connected-components:mean-color=true \
-define connected-components:keep-top=1 \
-connected-components 8 coahuila_ccl.png | \
grep "gray(0)" | awk '{print $2}')
convert coahuila_lat.png \
-crop $bbox  repage \
-background white -gravity center -extent $dims \
coahuila_processed.png

Division Normalized and Thresholded Image:

enter image description here

Morphology Connected Image:

enter image description here

Final Result:

enter image description here

Note if using Imagemagick 7, change convert to magick and also you can replace -threshold 70% to -auto-threshold otsu.

  • Related