How to find location of numpy array in larger array?-CodePudding

Given two Numpy arrays, A and B, how would you find the index where B occurs in A, allowing for some amount of noise?

For example:

>>> A = [1.2, 4.5, 18.1, 19.1, 3.3, 7.4, 9.5, 1.0, 6.5, 4.9, 2.4]
>>> B = [19.15, 3.35, 7.3]
>>> find_position(A, B)
3

The naive implementation of find_position(a, b) would be to just just loop over every index in A, and then from there iterate over B, calculating it's Euclidean distance for each pair of numbers from A and B, tracking the index that has the smallest distance.

Something like:

def find_position(a, b):
    """
    Finds index of b in a.
    """
    best = (1e999999, None)
    assert a.size >= b.size
    for i in range(a.size - b.size):
        score = sum(abs(b[j] - a[i j]) for j in range(b.size))
        best = min(best, (score, i))
    return best[1]

I'm guessing this is far from the most efficient solution. I'm not sure what the exact Big-O notation would be, but it's probably close to O(M*N), so for large arrays, this would take forever.

Is there a more efficient approach, or some method built-in to Numpy that makes those nested for loops a bit faster?

CodePudding user response：

Refered to followed Link, a numpy solution.

import numpy as np

def indexTol(a, b, tol=1e-6):
    a = np.unique(a)
    intervals = np.empty(2*a.size, float)
    intervals[::2] = a - tol
    intervals[1::2] = a   tol
    overlaps = intervals[:-1] >= intervals[1:]
    overlaps[1:] = overlaps[1:] | overlaps[:-1]
    keep = np.concatenate((~overlaps, [True]))
    intervals = intervals[keep]
    return np.searchsorted(intervals, b, side='right') & 1 == 1


A = np.array([1.2, 4.5, 18.1, 19.1, 3.3, 7.4, 9.5, 1.0, 6.5, 4.9, 2.4])
B = np.array([19.15, 3.35, 7.3])

np.nonzero(indexTol(B,A,tol=0.05))

(array([3], dtype=int64),)

CodePudding user response：

You could reduce complexity to O(log(M) * N) if you calculate distances between points of B and it closest companions in A from both left and right sides only:

def find_position(A, B):
    #find indices that sorts an array
    argidx = np.argsort(A)
    
    # find two set of indices with respect to sorted array
    next_idx = np.searchsorted(A, B, sorter=arg_idx)
    prev_idx = next_idx - 1
    next_idx[next_idx == len(A)], prev_idx[prev_idx == -1] = len(A) - 1, 0 #fixing extreme cases
    
    # find two set of indices with respect to initial array: a set of previous points and a set of next points
    previous_idx, next_idx = argidx[prev_idx], argidx[next_idx]
    
    # find distances between points of B and it closest companions of A in both of sets
    previous_dist, next_dist = np.abs(A[previous_idx]- B), np.abs(A[next_idx]- B)
    
    # find indices of minimal values of distances found
    argmin_previous, argmin_next = np.argmin(previous_dist), np.argmin(next_dist)
    
    # if minimum value of distances in the first set is smaller than the one in the second set, 
    # return its place in initial array; else return place of minimum value in the second set
    if previous_dist[argmin_previous] < next_dist[argmin_next]:
        out = previous_idx[argmin_previous]
    else:
        out = next_idx[argmin_next]
    return out