how to calculate the distances between all datapoints among each other-CodePudding

I want to check which data points within X are close to each other and which are far. by calculating the distances between each other without getting to zero, is it possible?

X = np.random.rand(20, 10)
dist = (X - X) ** 2
print(X)

CodePudding user response：

Another possible solution:

from scipy.spatial.distance import cdist

X = np.random.rand(20, 10)
cdist(X, X)

CodePudding user response：

You can go though each point in sequence

X = np.random.rand(20, 10)
no_points = X.shape[0]

distances = np.zeros((no_points, no_points))
for i in range(no_points):
    for j in range(no_points):
            distances[i, j] = np.linalg.norm(X[i, :] - X[j, :])

print(distances,np.max(distances))

CodePudding user response：

I would assume you want a way to actually get some way of keeping track of the distances, correct? If so, you can easily build a dictionary that will contain the distances as the keys and a list of tuples that correspond to the points as the value. Then you would just need to iterate through the keys in asc order to get the distances from least to greatest and the points that correspond to that distance. One way to do so would be to just brute force each possible connection between points.

dist = dict()
X = np.random.rand(20, 10)
for indexOfNumber1 in range(len(X) - 1):
   for indexOfNumber2 in range(1, len(X)):
      distance = sqrt( (X[indexOfNumber1] - X[indexOfNumber2])**2 )
      if distance not in dist.keys():
        dist[distance] = [tuple(X[indexOfNumber1], X[indexOfNumber2])]
      else:
        dist[distance] = dist[distance].append(tuple(X[indexOfNumber1], X[indexOfNumber2]))

The code above will then have a dictionary dist that contains all of the possible distances from the points you are looking at and the corresponding points that achieve that distance.