Accelerated matrix search in python-CodePudding

There is now a way to search the index of a submatrix in the main matrix, but both are slow, can anyone think of a faster way?

a=np.array([[5 4 5 9 3 4 6 2 5 3]
            [8 3 5 4 3 4 5 8 4 4]
            [5 7 8 5 2 3 3 6 8 8]
            [4 5 6 2 6 5 6 7 9 3]
            [3 6 8 2 8 7 3 8 8 8]])

b=np.array([[2 3 3]
            [6 5 6]])

def search_image(kernel,array):
    array_H, array_W = array.shape
    kernel_H, kernel_W = kernel.shape
    for x in range(0,array_W-kernel_W 1):
        for y in range(0,array_H-kernel_H 1):
           array_subsection = array[y:y kernel_H, x:x kernel_W]
           if (array_subsection== kernel).all():
               print(f"kernel was found at x={x}, y={y}.")
               return [x,y]
    print(f"kernel was not found in array.")

CodePudding user response：

Assuming the component of the small matrix are not repeated a lot of time in the big matrix and the small matrix is far smaller than the big one, you can search the first item of the small matrix in the big one so to then search the actual small matrix at the target found locations. This should help to reduce the number of sub-search by a large factor. Here is an example:

def search_image_faster(kernel, array):
    h, w = kernel.shape
    assert w > 0 and h > 0

    locs = np.where(array == kernel[0,0])

    for i in range(len(locs[0])):
        y, x = locs[0][i], locs[1][i]
        if np.array_equal(array[y:y h, x:x w], kernel):
            #print(f"kernel was found at x={x}, y={y}.")
            return [x,y]

    #print(f"kernel was not found in array.")

This strategy can be enhanced by checking other values like the last item of the small matrix. Still, there are pathological cases where this algorithm is not more efficient than the initial one.

The thing is Numpy alone provides no way to do that efficiently AFAIK. One very efficient solution is to use Numba to do it:

import numba as nb

@nb.njit(inline="always")
def match_subimage(kernel, array, ay, ax):
    kh, kw = kernel.shape
    ah, aw = array.shape

    if ay   kh > ah or ax   kw > aw:
        return False

    for ky in range(kh):
        for kx in range(kw):
            if array[ay ky, ax kx] != kernel[ky, kx]:
                return False

    return True

@nb.njit('(uint8[:,::1], uint8[:,::1])')
def search_image_fastest(kernel, array):
    kh, kw = kernel.shape
    ah, aw = array.shape
    assert kh kw > 0

    kv = kernel[0, 0]

    # Note this is faster to iterate over the x location in the nested loop
    for y in range(ah):
        for x in range(aw):
            # Fast-fail optimization
            if array[y, x] != kv:
                continue

            if match_subimage(kernel, array, y, x):
                #print(f"kernel was found at x={x}, y={y}.")
                return (x, y)

    #print(f"kernel was not found in array.")
    return None

Here are the timings for the input matrices on my i5-9600KF processor:

search_image:          55.2 µs
search_image_faster:   14.7 µs
search_image_fastest:   0.5 µs

The Numba implementation is about 100 times faster than the initial implementation. In fact, it is so fast that >60% of its execution time is just the time to start the Numba function from CPython and not the actual computation.

If this is not enough, you can use multiple Numba threads to do that., but using multiple thread can make the analysis of small input matrices slower due to the time to start threads.

If this is still not enough, then you can pre-compute the checksum of the small matrix and iterate over the big matrix so to compute the partial checksum and compare it to the precomputed one. This strategy aims to reduce the number of heavy matches like in the first solution except the probability for two checksum to be equal should be even smaller and thus possibly faster in pathological cases.