Home > OS >  compute all n-th closest points of all points in a dataset
compute all n-th closest points of all points in a dataset

Time:11-17

I have dataset of 1000 points on a plane. I represented all the possible pairs of points in P and calculated the distances of all possible pairs. What I have to do is: For a given n, calculate all the n-th nearest points for all points p in P.

What I did before:

P_pairs = [((33, 9), (34, 13)), ((33, 9), (62, 119)), ((33, 9), (33, 7)), ((33, 9), (48, 123)), ...]

listofdistances =  [{'((33, 9), (34, 13))': 4.123105625617661}, {'((33, 9), (62, 119))': 113.75851616472501}, {'((33, 9), (33, 7))': 2.0}, ...]

In this context, I am stuck in sorting listofdistances such that for every point, there are the minimum n distances as values left.

Maybe I have to calculate the n-th nearest points directly, instead of calculating all the distances of the points. But I don't exactly know how.

CodePudding user response:

Creating a list of all possible pairs and then a list of single key dictionaries with distances as values does create a sorting headache. I would instead vectorize this job and use numpy.

import numpy as np

P = np.array([(33, 9), (34, 13), (62, 119), ...])

# Finds the n closest points to p in P
def n_closest_points(p, P, n)
    p_vector = np.tile(p, (len(P), 1))
    dists = np.linalg.norm(P-p_vector, axis=1)
    sorted_dists = np.sort(dists)

    # Exclude the 0th element as the distance from p to itself is 0
    return sorted_dists[1:n 1] 

CodePudding user response:

P = [(33, 9), (34, 13), (62, 119), (33, 7), (48, 123)]
P = np.array(P)

x, y = P[:,0], P[:,1]
# Create a distance table of point (row) vs point (column)
dist = np.sqrt((x - x[:,None])**2   (y - y[:,None])**2)
# The diagonals are 0, as the distance of a point to itself is 0,
# but we want that to have a large value so it comes last in sorting
np.fill_diagonal(dist, np.inf)
# Get the sorted index for each row
idx = dist.argsort(axis=1)

Now if you want the nth nearest neighbours, with n = 3, you get that with idx = idx[:,:3]. And for the first point you can now do

P[0]             # the point itself
P[idx[0]]        # its nearest neighbours
dist[0,idx[0]]   # their distances
  • Related