Feature Extraction of grid points from scanned prints in Python / OpenCV-CodePudding

I have scanned documents which are printed by different inkjet-printers (Epson, HP, Canon and so on). Each photo has a very high quality (like 1,6GB) and you can zoom in and see the halftone of the picture which is using a frequency modulation.

My task is to do a feature extraction based on the grid dots, patterns of the grid, distance of the dots etc.

The relevant features are the size of these dots (each printers print different size of these dots - have to calculate the mean and standard deviation).

Later I will have to train a model with ML and the trained model should classify a print to a specific printer (so basically this print belongs to printer XYZ).

But for now I am struggling with the feature engineering already and the pre processing stuffs as this is my first computer-vision project actually and I am not so familar with opencv.

I have an idea and my plan is to a binary transformation with opencv of the images to determine the edges (Edge detection) via Sobel or Prewitt filter or whatsoever. So I think I have to put some blur and then a edge detection maybe?

I am not sure if this is the right approach, so that's why I ask here, what do you think? I would be happy if you can give me some hints or steps for the best or a good approach.

CodePudding user response：

Here is one way in Python/OpenCV.

Threshold on color using cv2.inRange(). In this case I will threshold on the blue dots. Then get all the external contours to find all the isolated regions. From the contours, compute the equivalent circular diameters. Then compute the average and standard deviations.

Input:

import cv2
import numpy as np
import math

img = cv2.imread("color_dots.png")

# threshold on blue color
lower = (190,150,100)
upper = (255,255,170)
thresh = cv2.inRange(img, lower, upper)

# get external contours
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
count = len(contours)

sum = 0
sum2 = 0
for cntr in contours:
    # get area from contours and then diameters of equivalent circles
    area = cv2.contourArea(cntr)
    # area = pi*radius**2 = pi*(diameter/2)**2 = (pi/4)*diameter**2
    # diameter = sqrt(4*area/pi) = 2*sqrt(area/pi)
    diameter = 2 * math.sqrt(area/math.pi)
    sum = sum   diameter
    sum2 = sum2   diameter * diameter

# compute average2 (mean)
average = sum/count
average2 = sum2/count

# compute standard deviation
variance = average2 - average*average
standard_deviation = math.sqrt(variance)

# print results
print("average:", average)
print("std_dev:", standard_deviation)

# save result
cv2.imwrite("color_dots_blue_threshold.png",thresh)

# display result
cv2.imshow("thresh", thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()

Threshold image:

Results:

average: 3.0747726858108635
std_dev: 0.541288251281962