Home > Back-end >  How would I check if each cell of an array has neighbors of a specified value quickly in numpy?
How would I check if each cell of an array has neighbors of a specified value quickly in numpy?

Time:05-03

Say I have an array like

    np.array([[0,0,0,1,0],
               [0,0,0,0,0],
               [0,1,0,0,0],
               [0,0,0,1,0],
               [0,0,0,0,0]],dtype=bool)

and I want to have a boolean array of all values with a neighboring cell that isn't 0 in that array, like:

    np.array([[0,0,1,0,1],
               [1,1,1,1,1],
               [1,0,1,1,1],
               [1,1,1,0,1],
               [0,0,1,1,1]],dtype=bool)

How would I do this without looping over everything in a python loop (since that's really slow)?

CodePudding user response:

You can use a sliding window to take maximum value in that window.

def foo(arr, window):
    r, c = arr.shape
    wr, wc = window
    ans = arr * 0
    for i in range(r):
        for j in range(c):
            if not arr[i, j]:                
                ans[i, j] = arr[max(i - wr, 0):min(i   wr   1, r), max(j - wc, 0):min(j   wc   1, c)].max()
            else:
                ans[i, j] = 0
        
    return ans

data = np.array([[0,0,0,1,0],
                 [0,0,0,0,0],
                 [0,1,0,0,0],
                 [0,0,0,1,0],
                 [0,0,0,0,0]])
foo(data, [1, 1])
# array([[0, 0, 1, 0, 1],
#        [1, 1, 1, 1, 1],
#        [1, 0, 1, 1, 1],
#        [1, 1, 1, 0, 1],
#        [0, 0, 1, 1, 1]])

OR

from scipy.ndimage import maximum_filter

ans = maximum_filter(data, size=(3, 3))
ans[data == 1] = 0
ans
# array([[0, 0, 1, 0, 1],
#        [1, 1, 1, 1, 1],
#        [1, 0, 1, 1, 1],
#        [1, 1, 1, 0, 1],
#        [0, 0, 1, 1, 1]])

CodePudding user response:

If you just want to use numpy, my way is to find out the neighbors of all true values in the original array, the calculation method is to judge whether the Chebyshev distance (L-infinite distance) between the position of elements in the array and the position of true values is 1, then merge them with logical or operation:

>>> ar = np.array([[0,0,0,1,0],
... [0,0,0,0,0],
... [0,1,0,0,0],
... [0,0,0,1,0],
... [0,0,0,0,0]], bool)
>>> row, col = ar.nonzero()
>>> rows, cols = np.indices(ar.shape)
>>> np.logical_or.reduce([np.maximum(np.abs(rows - i), np.abs(cols - j)) == 1 for i, j in zip(row, col)])
array([[False, False,  True, False,  True],
       [ True,  True,  True,  True,  True],
       [ True, False,  True,  True,  True],
       [ True,  True,  True, False,  True],
       [False, False,  True,  True,  True]])

By broadcasting, you can also avoid the list comprehension:

>>> rows, cols = np.indices(ar.shape, sparse=True)  # Setting to sparse does not affect the calculation.
>>> i = np.abs(rows[None] - row[:, None, None])
>>> j = np.abs(cols[None] - col[:, None, None])
>>> np.logical_or.reduce(np.maximum(i, j) == 1)
array([[False, False,  True, False,  True],
       [ True,  True,  True,  True,  True],
       [ True, False,  True,  True,  True],
       [ True,  True,  True, False,  True],
       [False, False,  True,  True,  True]])

To make the code look shorter, I used None instead of np.newaxis, you can use the latter for readability.

After testing, even with the help of broadcasting, it is slower than the second answer of @d.b , but it is not too bad:

>>> def loop_reduce(ar):
...     row, col = ar.nonzero()
...     rows, cols = np.indices(ar.shape)
...     return np.logical_or.reduce([np.maximum(np.abs(rows - i), np.abs(cols - j)) == 1 for i, j in zip(row, col)])
...
>>> def broadcast_reduce(ar):
...     row, col = ar.nonzero()
...     rows, cols = np.indices(ar.shape, sparse=True)
...     i = np.abs(rows[None] - row[:, None, None])
...     j = np.abs(cols[None] - col[:, None, None])
...     return np.logical_or.reduce(np.maximum(i, j) == 1)
...
>>> def max_filter(ar):
...     ans = maximum_filter(ar, size=(3, 3))
...     ans[ar] = False
...     return ans
...
>>> timeit(lambda: loop_reduce(ar), number=10000)
0.3127206000208389
>>> timeit(lambda: broadcast_reduce(ar), number=10000)
0.15011020001838915
>>> timeit(lambda: max_filter(ar), number=10000)
0.12893440001062118

At least this can be a way for you to solve similar problems in the future :-)

  • Related