I am trying to translate image using OCR, but the watermark is in the way. is there any way to remove orange watermark the picture or at least make it lighter? Also is it possible to do it in bulk (to all images in folder).
Here is picture of what it should look like after watermark removed. just example.
CodePudding user response:
You could probably just threshold, but cutting off the smoothing on low pixel count text can actually damage subsequent ocr pretty badly. So instead I created a mask that would kill the watermark and then applied it to the original image (this pulls the grey text boundary as well). Another trick that helped was to use the red channel since the watermark is most saturated on red ~245). Note that this requires opencv and c 17
#include <stdio.h>
#include <opencv2/opencv.hpp>
#include <Windows.h>
#include <string>
#include <filesystem>
namespace fs = std::filesystem;
using namespace std;
using namespace cv;
int main(int argc, char** argv)
{
bool debugFlag = true;
std::string path = "C:/Local Software/voyDICOM/resources/images/wmTesting/";
for (const auto& entry : fs::directory_iterator(path))
{
std::string fileName = entry.path().string();
Mat original = imread(fileName, cv::IMREAD_COLOR);
if (debugFlag) { imshow("original", original); }
Mat inverted;
bitwise_not(original, inverted);
std::vector<Mat> channels;
split(inverted, channels);
for (int i = 0; i < 3; i )
{
if (debugFlag) { imshow("chan" std::to_string(i), channels[i]); }
}
Mat bwImg;
cv::threshold(channels[2], bwImg, 50, 255, cv::THRESH_BINARY);
if (debugFlag) { imshow("thresh", bwImg); }
Mat outputImg;
inverted.copyTo(outputImg, bwImg);
bitwise_not(outputImg, outputImg);
if (debugFlag) { imshow("output", outputImg); }
if (debugFlag) { waitKey(0); }
else { imwrite(fileName, outputImg); }
}
}
Image showing the benefit of masking over just thresholding: