I have dataset of 1000 points on a plane. I represented all the possible pairs of points in P and calculated the distances of all possible pairs. What I have to do is: For a given n, calculate all the n-th nearest points for all points p in P.
What I did before:
P_pairs = [((33, 9), (34, 13)), ((33, 9), (62, 119)), ((33, 9), (33, 7)), ((33, 9), (48, 123)), ...]
listofdistances = [{'((33, 9), (34, 13))': 4.123105625617661}, {'((33, 9), (62, 119))': 113.75851616472501}, {'((33, 9), (33, 7))': 2.0}, ...]
In this context, I am stuck in sorting listofdistances
such that for every point, there are the minimum n distances as values left.
Maybe I have to calculate the n-th nearest points directly, instead of calculating all the distances of the points. But I don't exactly know how.
CodePudding user response:
Creating a list of all possible pairs and then a list of single key dictionaries with distances as values does create a sorting headache. I would instead vectorize this job and use numpy.
import numpy as np
P = np.array([(33, 9), (34, 13), (62, 119), ...])
# Finds the n closest points to p in P
def n_closest_points(p, P, n)
p_vector = np.tile(p, (len(P), 1))
dists = np.linalg.norm(P-p_vector, axis=1)
sorted_dists = np.sort(dists)
# Exclude the 0th element as the distance from p to itself is 0
return sorted_dists[1:n 1]
CodePudding user response:
P = [(33, 9), (34, 13), (62, 119), (33, 7), (48, 123)]
P = np.array(P)
x, y = P[:,0], P[:,1]
# Create a distance table of point (row) vs point (column)
dist = np.sqrt((x - x[:,None])**2 (y - y[:,None])**2)
# The diagonals are 0, as the distance of a point to itself is 0,
# but we want that to have a large value so it comes last in sorting
np.fill_diagonal(dist, np.inf)
# Get the sorted index for each row
idx = dist.argsort(axis=1)
Now if you want the nth nearest neighbours, with n = 3, you get that with idx = idx[:,:3]
. And for the first point you can now do
P[0] # the point itself
P[idx[0]] # its nearest neighbours
dist[0,idx[0]] # their distances