Sorry for my bad english, i have a small database that contains hashes of photos, when I try to find similar photos to the one below:
for which the following hash was calculated: "0f3f2764ecc482c2" using the method average_hash()
imagehash.average_hash(Image.open(imagePath))
The system finds a very large number of collisions, below is an example of photos that were identified as completely identical:
The table in which I store hashes of photos:
CREATE TABLE IF NOT EXISTS photos(id_photo BIGINT PRIMARY KEY, photo_hash TEXT, FOREIGN KEY (id_photo) REFERENCES users (id));
Adding hash:
cur.execute('''INSERT INTO photos(id_photo, photo_hash) VALUES(%s, %s)''', id, str(result_image_recognition['photo_hash'])])
SQL Query by which I calculate the Euclidean distance between my hash and stored hashes:
SELECT id_photo, BIT_COUNT(photo_hash ^ '0f3f2764ecc482c2') FROM photos;
The photos table contains 7889 photos, 959 of them are mistakenly determined by this query as completely identical (Euclidean distance is 0). About a week I can not solve this problem please someone help me.
CodePudding user response:
You need to convert the hex strings to integers before doing xor operations
SELECT id_photo, BIT_COUNT(CONV(photo_hash, 16, 10) ^ CONV('0f3f2764ecc482c2', 16, 10)) FROM photos;
because all strings whose first character is not 1-9 are converted to 0.
mysql> select 'abc' ^ 'def'; # -> 0
mysql> select CONV('abc', 16, 10) ^ CONV('def', 16, 10); # -> 1875