Home > OS >  How to find the minimum distances of each point from a of list of points to all the other points in
How to find the minimum distances of each point from a of list of points to all the other points in

Time:10-08

As seen in the picture I have an outlier and I would like to remove it(not the red one but the one above it in green, which is not aligned with other points) and hence I am trying to find the min distance and then try to eliminate it. But given the huge dataset it takes an eternity to execute. This is my code below. Appreciate any solution that helps, thanks! enter image description here

import math
#list of 11600 points
dataset = [[2478, 3534], [4217, 953],......,11600 points]  

copy_dataset = dataset

Indices =[]

Min_Dists =[]

Distance = []

Copy_Dist=[]

for p1 in range(len(dataset)):

    p1_x= dataset[p1][0]
    p1_y= dataset[p1][1]
    for p2 in range(len(copy_dataset)):
        
        p2_x= copy_dataset[p2][0]
        p2_y= copy_dataset[p2][1]

        dist = math.sqrt((p1_x - p2_x) ** 2   (p1_y - p2_y) ** 2)
        Distance.append(dist)
        Copy_Dist.append(dist)
        
    min_dist_1= min(Distance)
    Distance.remove(min_dist_1)
    
    if(min_dist_1 !=0):
        Min_Dists.append(min_dist_1)
        ind_1 = Copy_Dist.index(min_dist_1)
        Indices.append(ind_1)

    min_dist_2=min(Distance)
    Distance.remove(min_dist_2)    
    if(min_dist_2 !=0):
        Min_Dists.append(min_dist_2)
        ind_2 = Copy_Dist.index(min_dist_2)
        Indices.append(ind_2)

    To_Remove = copy_dataset.index([p1_x, p1_y])
    copy_dataset.remove(copy_dataset[To_Remove])

CodePudding user response:

Not sure how to solve this problem in general, but it's probably a lot faster to compute the distances in a vectorized fashion.

dataset_copy = dataset.copy()
dataset_copy = dataset_copy[:, np.newaxis]

distance = np.sqrt(np.sum(np.square(dataset - dataset_copy), axis=~0))

CodePudding user response:

If you need all distances at once:

distances = scipy.spatial.distance_matrix(dataset, dataset)

If you need distances of one point to all others:

for pt in dataset:
    distances = scipy.spatial.distance_matrix([pt], dataset)[0]
    # distances.min() will be 0 because the point has 0 distance to itself
    # the nearest neighbor will be the second element in sorted order
    indices = np.argpartition(distances, 1) # or use argsort for a complete sort
    nearest_neighbor = indices[1]

Documentation: distance_matrix, argpartition

  • Related