Home > Mobile >  Find Least Similar Vectors Python and Numpy
Find Least Similar Vectors Python and Numpy

Time:09-16

if I have a list of arrays e.g;

[0,1,1,0,1,1,0]
[1,0,1,0,0,0,1]
[1,0,1,0,1,1,1]
[1,0,1,0,1,0,1]
[0,1,0,0,0,0,0]
[1,0,0,0,0,0,1]
[1,0,1,0,1,1,1]
[1,0,1,0,0,0,1]

and I want to find the n arrays the least like the others, what would be the best method?

e.g; I want two arrays that are the least similar to the group as a whole.

CodePudding user response:

You would first need to make some methodology choices before you can actually implement a solution.

  1. How do you define most different? You need to choose or define the distance measure you would like to use. The appropriate measure is really dependent on the problem you are trying to solve
  2. How do you define the difference of a single array vs the group of arrays? For example, you could define a method whereby you leave one out, take the average of the rest, and then compute the distance between the array you left out vs the average of the rest. Alternatively, you could compute distance between all pairs of arrays in your group, and then choose the two whose average difference vs the rest is the largest.

Some ideas for (1):

  • Hamming distance which essentially just counts the number of entries that do not match between two arrays. Since your example given is binary it may be appropriate
  • The L2 norm of the difference of the vectors (essentially just the sum of squares of the difference between each error). Probably the most popular, at least in some domains. Note that it is more sensitive to outliers, which you may or may not want. You can also compute the L1 norm instead, which is just the sum of the difference, and in the binary case will match the Hamming distance.
  • Many many more. Try searching distance measures for arrays, or clustering distance measures and so on. It really comes down to the interpretation of your data.

Once you have chosen desired methodology for (1) and (2), the implementation should not be too difficult.

  • Related