Algorithms and data structures for comparing things and deciding how similar they are?-CodePudding

What algorithms and/or data-structures that can be applied to decide how similar two things are based on some common characteristics.

What area of knowledge deals with this type of problem?

One way of doing this could be:

** Where each int value represents some characteristic

** and each set of ints represent a group of characteristics within a feature for example

Object_1: {1, 2, 3}, {11, 14}, {88, 90}

Object_2: {4, 7}, {12, 16}, {81, 91}

Search Term: {2, 90}

Search should return 'Object_1' because {2, 90} is subset of {1, 2, 3, 11, 14, 88, 90}

Hope this example narrows the question down a bit.

CodePudding user response：

There are many different types of similarity measures out there. To decide which one to use the first step is to specify your level of measurement and type of data. Here are a few similarity measures for categorical and cont data:

For categorical data:
- Hamming distance
- Sokal-Michener
- Russel–Rao
For continuous data:
- Minkowski-based distances e.g Euclidean distance, Manhattan distance
- Mahalanobis distance

The general algorithm that is used for similarity-based learning is Nearest Neighbor algorithm. For more information you can refer to John.D Kelleher's book.