Python compare images of, piece of, clothing (identification)-CodePudding

As an example I have two pictures with a particular type of clothing of a certain brand. I can download a lot of different images of this same piece, and color, of clothing

I want to create a model which can recognize the item based on a picture. I tried to do it using this example: https://www.tensorflow.org/tutorials/keras/classification. This can recognize the type of clothing (eg shirt or shoe or trousers, etc) But not a specific item and color. My goal is to have a model that can tell me that the person on my first picture is wearing the item of my second picture. As mentioned I can upload a few variations of this same item to train my model, if that would be the best approach.

I also tried to use https://pillow.readthedocs.io This can do something with color recognition but does not solve my initial goal.

CodePudding user response：

i don't think that CNN can help you in your problemes, take a look at the SIFT Technique see this for more détails.it is used for image matching and i think it's better in your cas. if your not looking to get in to much detailes the opencv is a python (and c i think) library that has image matching function that are easy to use more détails .

CodePudding user response：

As mentionned by @nadji mansouri, I would use SIFT technique as it suits your need. But I want just to correct something, CNN is also a thing in this case. This being said, I wouldn't tackle the problem as a classification problem, but rather using Distance Metric Learning, i.e, training a model to generate embeddings that are similar in the space when the inputs are similar, and distant otherwise. But to do this you need a large representative dataset.

In short, I suggest starting with SIFT, using OpenCV, or open source implementations on GitHub, playing around with the parameters and see what fits your case best, and then see if it's really necessary to switch to a neural network, and in this case tackling the problem as a metric learning task, maybe with something like siamese networks.

Some definitions:

Metric learning is an approach based directly on a distance metric that aims to establish similarity or dissimilarity between data (images in your case). Deep Metric Learning on the other hand uses Neural Networks to automatically learn discriminative features from the data and then compute the metric. source.

The Scale-Invariant Feature Transform (SIFT) is a method used in computer vision to detect and describe local features in images. The algorithm is invariant to image scale and rotation, and robust to changes in illumination and affine distortion. SIFT features are represented by local image gradients, which are calculated at various scales and orientations, and are used to identify keypoints in an image. These keypoints and their associated descriptor vectors can then be used for tasks such as image matching, object recognition, and structure from motion. source, with modification.