Home > Software engineering >  Numpy: Efficient search for an array into the other one
Numpy: Efficient search for an array into the other one

Time:09-22

I have got two arrays like those below:

first_array = np.array([2,2,2,10,10,15,20,20,20,20])
second_array = np.array([15,5,10,78,2,44,20,2,66,1,10,15,40,85,71,23,20,45,29,20,1])

I want to search for each element of the first array inside the second one to end up with a 2D array which includes the indexes of the second array and the values searched.

The codes below are working and giving what I desire to me, but they also seem definitely inefficient to me. There must exist an index operation or some different approaches rather than applying element by element search (loop).

out = []
for i in first_array:
    index = np.argwhere(second_array == i)
    out.append(np.array([*index.T,np.ones(len(index))*i]))
    
np.hstack(out).T

Here is the desired output for those who don't want to run the codes.

desired_output = np.array([[4,7,4,7,4,7,2,10,2,10,0,11,6,16,19,6,16,19,6,16,19,6,16,19],
                         [2,2,2,2,2,2,10,10,10,10,15,15,20,20,20,20,20,20,20,20,20,20,20,20]]).T

Thanks in advance!

CodePudding user response:

It is easy to think of two solutions:

  1. Broadcast comparison, but this is still the O(mn) algorithm (m = len(first_array) and n = len(second_array)):
>>> i, j = (first_array[:, None] == second_array).nonzero()
>>> np.array([j, first_array[i]]).T
array([[ 4,  2],
       [ 7,  2],
       [ 4,  2],
       [ 7,  2],
       [ 4,  2],
       [ 7,  2],
       [ 2, 10],
       [10, 10],
       [ 2, 10],
       [10, 10],
       [ 0, 15],
       [11, 15],
       [ 6, 20],
       [16, 20],
       [19, 20],
       [ 6, 20],
       [16, 20],
       [19, 20],
       [ 6, 20],
       [16, 20],
       [19, 20],
       [ 6, 20],
       [16, 20],
       [19, 20]], dtype=int64)
  1. Use the dictionary to build the mapping between elements and indices. If don't consider the final concatenating result step, this is the O(m n) algorithm. However, in the current example, the performance is not as good as the first solution:
>>> mp = {}
>>> for i, val in first_array:
...     mp.setdefault(val, []).append(i)
...
>>> np.concatenate([(indices := mp[val], [val] * len(indices))
...                 for val in first_array], -1).T
array([[ 4,  2],
       [ 7,  2],
       [ 4,  2],
       [ 7,  2],
       [ 4,  2],
       [ 7,  2],
       [ 2, 10],
       [10, 10],
       [ 2, 10],
       [10, 10],
       [ 0, 15],
       [11, 15],
       [ 6, 20],
       [16, 20],
       [19, 20],
       [ 6, 20],
       [16, 20],
       [19, 20],
       [ 6, 20],
       [16, 20],
       [19, 20],
       [ 6, 20],
       [16, 20],
       [19, 20]])
  • Related