I have a directory of images and an image that I know is in this image directory there is a similar image in the directory saved in a different format and scaled differently, but I dont know where (about 100 000 images).
I want to look for the image and find out its filename inside this directory.
I am looking for a mostly already made soulution which I couldn't find. I found OpenCV but I would need to write code around that. Is there a project like that out there?
If there isn't could you help me make a simple C# console app using OpenCV, I tried their templates but never managed to get SURF or CudaSURF working.
Thanks
Edited as per @Mark Setchell's comment
CodePudding user response:
If the image is identical, the fastest way is to get the file size of the image you are looking for and compare it with the file sizes of the images amongst which you are searching.
I suggest this first because, as Christoph clarifies in the comments, it doesn't require reading the file at all - it is just metadata.
If that yields more than one matching answer, calculate a hash (MD5 or other) and pick the filename that produces the same hash.
Again, as mentioned by Christoph in the comments, this doesn't require decoding the image, or holding the decompressed image in RAM, just checksumming it.
CodePudding user response:
So in the end I used this site and modified the python code used there for searching a directory instead of a single image. There is not much code so the full thing is below:
import argparse
from ast import For, arg
import cv2
from os import listdir
from os.path import isfile, join
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", type=str, required=True,
help="path to input image where we'll apply template matching")
ap.add_argument("-t", "--template", type=str, required=True,
help="path to template image")
args = vars(ap.parse_args())
# load the input image and template image from disk
print("[INFO] loading template...")
template = cv2.imread(args["template"])
cv2.namedWindow("Output")
cv2.startWindowThread()
# Display an image
cv2.imshow("Output", template)
cv2.waitKey(0)
# convert both the image and template to grayscale
templateGray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
imageFileNames = [f for f in listdir(args["image"]) if isfile(join(args["image"], f))]
for imageFileName in imageFileNames:
try:
imagePath = args["image"] imageFileName
print("[INFO] Loading " imagePath " from disk...")
image = cv2.imread(imagePath)
print("[INFO] Converting " imageFileName " to grayscale...")
imageGray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
print("[INFO] Performing template matching for " imageFileName "...")
result = cv2.matchTemplate(imageGray, templateGray,
cv2.TM_CCOEFF_NORMED)
(minVal, maxVal, minLoc, maxLoc) = cv2.minMaxLoc(result)
(startX, startY) = maxLoc
endX = startX template.shape[1]
endY = startY template.shape[0]
if maxVal > 0.75:
print("maxVal = " str(maxVal))
# draw the bounding box on the image
cv2.rectangle(image, (startX, startY), (endX, endY), (255, 0, 0), 3)
# show the output image
cv2.imshow("Output", image)
cv2.waitKey(0)
cv2.imshow("Output", template)
except KeyboardInterrupt:
break
except:
print(imageFileName)
print("Error")
cv2.destroyAllWindows()
The code above shows any image with match value (what I guess is how much similarity there is between source and template) greater than 0.75 Probably still too low but if you want to use it tweak it to your liking. Note that this WILL NOT work if the image is rotated and if, like me, you have a bright light source in the template other lightsources will come up as false positives
As for time it took me about 7 hours, where the script paused about every 20 minutes for a false positive until I found my image. I got through about 2/3 of all images.
as a sidenote it took 10 minutes to just build the array of files inside the directory, and it took about 500mb of ram once done
This is not the best answer so if anyone more qualified finds this feel free to write another answer.