Home > Net >  Find closest elements of two lists - the best way
Find closest elements of two lists - the best way

Time:02-10

I have two lists(or columns of dataframe - doesn't matter) of numerical values:

L1 = [1, 0.5, 3, 7, 4.7]
L2 = [2, 0.4, 8, 0.3, 5]

for example. I need to correlate these lists and find pairs(indexOfElementFromL1, indexOfElementFromL2) which corresponds min diff of two elements. For example, for my example it should be: (0,1), (1,1), (2,0), (3,2), (4,4). Really what I want - find closest element L2 to each one of L1. Of course I could naive approch like:

for el1 in L1:
  for el2 in L2:
    ....

But I'd to like to see more correct solution

CodePudding user response:

You can do it really fast by converting your lists to numpy arrays and using numpy broadcasting:

a1 = np.array(L1)
a2 = np.array(L2)

indexes = abs(a1[:, None] - a2).argmin(axis=1)
out = list(enumerate(indexes))

Output:

>>> indexes
array([1, 1, 0, 2, 4])

>>> out
[(0, 1), (1, 1), (2, 0), (3, 2), (4, 4)]

CodePudding user response:

I don't see how an answer could be 'more correct', but this is how I'd do it. I like it more than a cramped one liner because it's still readable.

L1 = [1, 0.5, 3, 7, 4.7]
L2 = [2, 0.4, 8, 0.3, 5]


def find_closest(x):
    l2_diffs = [abs(x - y) for y in L2]
    return l2_diffs.index(min(l2_diffs))


res = [(i, find_closest(x)) for i, x in enumerate(L1)]

print(res)

Output:

[(0, 1), (1, 1), (2, 0), (3, 2), (4, 4)]

CodePudding user response:

taken from another answer; though I do not know how to flag for such:

Find nearest value in numpy array

import numpy as np

L1 = [1, 0.5, 3, 7, 4.7]
L2 = [2, 0.4, 8, 0.3, 5]

def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

for el1 in L1:
     print(find_nearest(L2, el1))

CodePudding user response:

You can use brute force methods using for loops or taking advantage of broadcasting, as you can see in other answers.

The problem comes when the lists are large. If that's the case, for loops will be slow and broadcasting will take a lot of memory.

The alternative is to use a search algorithm like FLANN:

>>> import numpy as np
>>> from pyflann import FLANN

>>> L1 = np.array([[1, 0.5, 3, 7, 4.7]]).T      # Data points as rows
>>> L2 = np.array([[2, 0.4, 8, 0.3, 5]]).T

>>> idx, _ = FLANN().nn(pts=L2, qpts=L1)
>>> idx
array([1, 1, 0, 2, 4], dtype=int32)

>>> list(enumerate(idx))
[(0, 1), (1, 1), (2, 0), (3, 2), (4, 4)]

Or, if you want to perform multiple queries:

>>> flann = FLANN()
>>> flann.build_index(pts=L2)           # Build search index
>>> idx, _ = flann.nn_index(qpts=L1)    # Query
  • Related