Given two Numpy arrays, A and B, how would you find the index where B occurs in A, allowing for some amount of noise?
For example:
>>> A = [1.2, 4.5, 18.1, 19.1, 3.3, 7.4, 9.5, 1.0, 6.5, 4.9, 2.4]
>>> B = [19.15, 3.35, 7.3]
>>> find_position(A, B)
3
The naive implementation of find_position(a, b)
would be to just just loop over every index in A, and then from there iterate over B, calculating it's Euclidean distance for each pair of numbers from A and B, tracking the index that has the smallest distance.
Something like:
def find_position(a, b):
"""
Finds index of b in a.
"""
best = (1e999999, None)
assert a.size >= b.size
for i in range(a.size - b.size):
score = sum(abs(b[j] - a[i j]) for j in range(b.size))
best = min(best, (score, i))
return best[1]
I'm guessing this is far from the most efficient solution. I'm not sure what the exact Big-O notation would be, but it's probably close to O(M*N), so for large arrays, this would take forever.
Is there a more efficient approach, or some method built-in to Numpy that makes those nested for loops a bit faster?
CodePudding user response:
Refered to followed Link, a numpy solution.
import numpy as np
def indexTol(a, b, tol=1e-6):
a = np.unique(a)
intervals = np.empty(2*a.size, float)
intervals[::2] = a - tol
intervals[1::2] = a tol
overlaps = intervals[:-1] >= intervals[1:]
overlaps[1:] = overlaps[1:] | overlaps[:-1]
keep = np.concatenate((~overlaps, [True]))
intervals = intervals[keep]
return np.searchsorted(intervals, b, side='right') & 1 == 1
A = np.array([1.2, 4.5, 18.1, 19.1, 3.3, 7.4, 9.5, 1.0, 6.5, 4.9, 2.4])
B = np.array([19.15, 3.35, 7.3])
np.nonzero(indexTol(B,A,tol=0.05))
(array([3], dtype=int64),)
CodePudding user response:
You could reduce complexity to O(log(M) * N)
if you calculate distances between points of B and it closest companions in A from both left and right sides only:
def find_position(A, B):
#find indices that sorts an array
argidx = np.argsort(A)
# find two set of indices with respect to sorted array
next_idx = np.searchsorted(A, B, sorter=arg_idx)
prev_idx = next_idx - 1
next_idx[next_idx == len(A)], prev_idx[prev_idx == -1] = len(A) - 1, 0 #fixing extreme cases
# find two set of indices with respect to initial array: a set of previous points and a set of next points
previous_idx, next_idx = argidx[prev_idx], argidx[next_idx]
# find distances between points of B and it closest companions of A in both of sets
previous_dist, next_dist = np.abs(A[previous_idx]- B), np.abs(A[next_idx]- B)
# find indices of minimal values of distances found
argmin_previous, argmin_next = np.argmin(previous_dist), np.argmin(next_dist)
# if minimum value of distances in the first set is smaller than the one in the second set,
# return its place in initial array; else return place of minimum value in the second set
if previous_dist[argmin_previous] < next_dist[argmin_next]:
out = previous_idx[argmin_previous]
else:
out = next_idx[argmin_next]
return out