Home > front end >  What is the most time efficient way to calculate the distance between tuples in a list?
What is the most time efficient way to calculate the distance between tuples in a list?

Time:01-31

I have a list with tuples:

tuple_list = [(1,3),(4,7),(8,1),(5,4),(9,3),(7,2),(2,7),(3,1),(8,9),(5,2)]

From this list, I want to return the minimum distance of two numbers in a tuple.

In the naive approach, I would do the following:

distance = 10
for tup in tuple_list:
    if abs(tup[0]-tup[1]) < distance:
        distance = abs(tup[0]-tup[1])

Then, in the end, distance would equal 1.

However, I suspect there is a faster method to obtain the minimum distance that calculates all the distances in parallel.

CodePudding user response:

To be clear, in the CPython reference interpreter, parallelized computations are pretty useless; the GIL prevents you from gaining meaningful benefit from CPU-bound work like this unless the work can be done by an extension that manually releases the GIL, using non-Python types. numpy could gain you some benefit (if the data was already in a numpy array) by vectorizing (likely to do better than actual parallelization anyway, unless the data is enormous), but no matter how you slice it, the general case, for arbitrary data, will be O(n); you can't improve on that in the general case because every item must be considered, so even in ideal circumstances, you're just applying a constant divisor to the work, but it remains O(n).

You can simplify your code a bit, and use constructs that are better optimized in CPython, e.g.

distance = min(abs(d1 - d2) for d1, d2 in tuple_list)

which will compute abs(d1 - d2) only once per loop, and potentially save a little overhead over the plain for loop if check (plus, it'll remove the need to come up with an initializer for distance that's definitely larger than the minimum that should replace it), but it's still O(n), it's just simpler code with some minor micro-optimizations.

In some special cases you could improve on this though. If you must regularly modify the list, and must be able to quickly determine the smallest difference at any given point in time, you could use a heap with precomputed differences. Adding a new item, or removing the minimum item, in the heap would be O(log n) (constructing the heap in the first place being O(n)), and getting the current smallest item would be O(1) (it's always in index 0).

Constructing the heap in the first place:

import heapq

tuple_list = [(1,3),(4,7),(8,1),(5,4),(9,3),(7,2),(2,7),(3,1),(8,9),(5,2)]
tuple_heap = [(abs(a - b), (a, b)) for a, b in tuple_list]  # O(n) work
heapq.heapify(tuple_heap)  # O(n) work; tuple_heap.sort() would also work,
                           # but it would be O(n log n)

Adding a new item (where x and y are the items to add):

heapq.heappush(tuple_heap, (abs(x - y), (x, y)))  # O(log n)

Popping off the current smallest item:

diff, tup = heapq.heappop(tuple_heap)  # O(log n)
# Or to unpack values:
diff, (x, y) = heapq.heappop(tuple_heap)  # O(log n)

Getting values from current smallest item (without removing it):

diff, tup = tuple_heap[0]  # O(1)
# Or to unpack values:
diff, (x, y) = tuple_heap[0]  # O(1)

Obviously, this only make sense if you regularly need the current minimum item, and the set of things to consider is constantly changing, but it's one of the few cases where you can get better than O(n) performance in common cases, without paying more than O(n) costs in setup costs.

CodePudding user response:

The only way you can optimise this would be using multi-threaded solution, and calculating the tuple-distance for each tuple in a thread, you'll see probably a time advantage for large lists, but still, in terms of complexity it will be the same O(n). Since the solution you provided is already the most optimal, it has already a time complexity of O(n), and there isn't a more optimal approach to find a minimum in a list than O(n).

CodePudding user response:

You can use the sorted() function to sort the tuples in tuple_list by the difference between their elements, then use the min() function to find the minimum difference in the sorted list.

def min_distance(tuple_list):
    sorted_list = sorted(tuple_list, key=lambda x: abs(x[0]-x[1]))
    return abs(sorted_list[0][0]-sorted_list[0][1])

print(min_distance(tuple_list))
  • Related