How can we find K closest points to the origin, and sort based on x-coordinates when there's eq-CodePudding

We are given a bunch of (x, y) coordinates and we need to find the K-Closest points to the origin. When we find a few points which have equal distance from origin and we need to consider between some of them, we take the points whose x-coordinate is the smallest. In case both the x-coordinates are same(that mean 2 points are equal) - we take either of them.

For example - we are given (1,1), (-2,3), (2, 3) and K = 2

So, the answer would be (1,1) and (-2, 3).

My approach was - generate a map between the distance vs location in original points list. And then get an array of distances and sort on that. Then we take each distance in the new sorted array and extact all keys from the map(hence the locations in the original array), if they're more than 1 and we haven't reached K yet -> we check the x-coordinate and add it to the result. But I couldn't implement this correctly, and I think this is not the most efficient way of doing it.

CodePudding user response：

You don't need to find all the distances and sort them. It's enough to keep a max heap that's at most of K length. You can have the x coordinates as the second parameter for the heap sorting algorithm.

Here's a python solution to the problem which you can treat as almost pseudo code:

import heapq

hp = []
for x, y in coordinates:
  dist = x**2   y**2
  heapq.heappush(hp, (-dist, -x, y))
  if len(hp) >= K:
    heapq.heappop(hp)

res = [(-x, y) for d, x, y in hp]

Please note that we could have used heapq.heappushpop for when len(hp) >= k. This is slightly more efficient, but it's beside the point since it's implementation specific.

Here's what we do:

Push tuples of (dist, x, y) on the heap. Note that since python's default heap implementation is a min heap, we need to negate the distance and x coordinate so that the bigger distances and x coordinates are at the top of the heap.
Whenever the length of the heap is >= K, we pop one out. This ensures that the coordinate with the longest distance is popped first (since it is a min heap and we are negating the distances). It also ensures that in case of a tie, the greater x coordinate is popped first.
At the end we extract the coordinates from the heap making sure to negate the x coordinate back.

The time complexity of this algorithm is O(n * log k) where n is the length of the coordinates list and k is the number of coordinates we want.

The space complexity is k.

CodePudding user response：

Alternative Approach will a Balltree. It can query the closest K points. I first generate 10K random points, integers.

import numpy as np
from sklearn.neighbors import BallTree

points = np.floor(np.random.normal(size=(10000,2), loc=(25,25), scale=(16,16)))

K = 12

Create a tree, and query it to find closest K points, to (0,0).

tree = BallTree(points, leaf_size=10, metric='euclidean')

distances, indici = tree.query( np.array([[0,0]]), k=K,return_distance=True)

Create a (distance, point) tuple, to be able to later sort as we please.

close_points_as_list = [ (x,y) for (x,y) in points[indici].tolist()[0] ]
distances_as_list = distances.tolist()[0]

distances_points_tuples = list(zip(distances_as_list, close_points_as_list))

for a p in the list distances_points_tuples, p[0] is the distance to (0,0), and p[1][0] is x, p[1][1] is y.

I was not sure what 'smallest' x is. Note I use np.abs() of x, to define 2nd level ordening.

sorted_points = sorted(distances_points_tuples, key=lambda x: (x[0], np.abs(x[1][0])) )

import pprint

pprint.pprint(sorted_points)

will give

[(0.0, (0.0, 0.0)),
 (1.4142135623730951, (1.0, -1.0)),
 (1.4142135623730951, (-1.0, -1.0)),
 (1.4142135623730951, (-1.0, 1.0)),
 (1.4142135623730951, (-1.0, 1.0)),
 (2.0, (0.0, -2.0)),
 (2.0, (0.0, 2.0)),
 (2.23606797749979, (-1.0, -2.0)),
 (2.23606797749979, (2.0, 1.0)),
 (2.23606797749979, (-2.0, 1.0)),
 (2.23606797749979, (-2.0, -1.0)),
 (2.23606797749979, (-2.0, 1.0))]