There's a set of pictures that if you take them literally, they're different. That is, there can be two images of the same thing among them, only in different hue. In addition, among these two, there is a difference in the white pixel filling. That is, there are several pixels that do not match the criterion: white or non-white.
What is the algorithm for finding such two pictures? Two of them are colored in different color hue and there are difference in white pixels. All the images have same dimentions.
Probably the standard approach here will not work. I.e. using
public boolean equals(Object obj) {}
public int hashCode() {}
Because we need to find similarity in diferences or vice versa...
CodePudding user response:
What you need to calculate is a perceptual hash.
While normal hash algorithms try to produce vastly different results for slight changes of the hashed data, a perceptual hash tries it the other way round. Once you have two such hash values, you can calculate the distance - meaning how much difference is in the data. From that you can then easily decide how much is still good enough.
I have once used this library and was happy about the result: https://github.com/KilianB/JImageHash
When taking pictures usually you take two or three because someone has the eyes shut - this algorithm can find them. It even does not matter if the image was converted to black and white or it was scaled down to a thumbnail.
CodePudding user response:
From your explanation, small pixel differences do not matter. Color does not matter.
Therefore you can simply go pixel by pixel in a pair of images and count how many times a pixel is non-white in both images. Then compare those counts. For similar images the counts will be similar (not necessarily equal).
If you had the large white space around digits removed, you could try this algorithm demo, it should work, because it does color normalization before comparison. The method is explained on the site.
Here is a screen scraping application which does digit comparison, maybe it is somehow relevant to your goal. Check usage of color thresholds in function EucMetric in this file.