I have scanned images like this where the background color is not necessarily consistent. When I use an ImageMagick command like this, it will apply a fixed threshold, which is not good for images without a consistent background.
convert in.jpg -threshold 35% -type bilevel -monochrome -compress LZW out.pdf
Can anybody provide a robust way to generate the corresponding monochrome image maintaining all the texts?
I think the best method probably should be based on deep learning. But DL may take too many resources to run. Non-DL methods are also welcome if it can render reasonably good results.
CodePudding user response:
You can improve that using -lat function in Imagemagick as follows:
Input:
convert coahuila.jpg -colorspace gray -negate -lat 50x50 10% -negate result.jpg
Note: I suggest saving as PNG or TIFF to avoid extra JPG compression. I only saved as JPG since this forum has size restrictions.
CodePudding user response:
You obtain a good result with adaptive thresholding (for instance size 43, offset 28 but this setting is not critical).
CodePudding user response:
Here is another Imagemagick 6 approach using division normalization and thresholding. Then some morphology to connect the rows of text. Then followed by connected components processing to find the largest black connected region. From that we extract the bounding box and use its dimension to crop the normalized image and thresholded image and pad back out with white to the size of the original image.
Input:
dims=$(convert coahuila.jpg -format "%wx%h" info:)
bbox=$(convert coahuila.jpg \
-colorspace gray \
\( clone -blur 0x20 \) \
swap \
-compose divide -composite \
-threshold 70% \
-type bilevel \
write coahuila_div.png \
-morphology open rectangle:1x55 \
-morphology close rectangle:3x1 \
-define connected-components:verbose=true \
-define connected-components:mean-color=true \
-define connected-components:keep-top=1 \
-connected-components 8 coahuila_ccl.png | \
grep "gray(0)" | awk '{print $2}')
convert coahuila_lat.png \
-crop $bbox repage \
-background white -gravity center -extent $dims \
coahuila_processed.png
Division Normalized and Thresholded Image:
Morphology Connected Image:
Final Result:
Note if using Imagemagick 7, change convert to magick and also you can replace -threshold 70% to -auto-threshold otsu.