Number of times an array is present in another array in Python-CodePudding

How can I count the number of times an array is present in a larger array?

a = np.array([1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1])
b = np.array([1, 1, 1])

The count for the number of times b is present in a should be 3

b can be any combination of 1s and 0s

I'm working with huge arrays, so for loops are pretty slow

CodePudding user response：

If the subarray being searched for contains all 1s, you can count the number of times the subarray appears in the larger array by convolving the two arrays with np.convolve and counting the number of entries in the result that equal the size of the subarray:

# 'valid' = convolve only over the complete overlap of the signals
>>> np.convolve(a, b, mode='valid')
array([1, 1, 2, 3, 2, 2, 2, 3, 3, 2, 1, 1])
#               ^           ^  ^             <= Matches

>>> win_size = min(a.size, b.size)
>>> np.count_nonzero(np.convolve(a, b) == win_size)
3

For subarrays that may contain 0s, you can start by using convolution to transform a into an array containing the binary numbers encoded by each window of size b.size. Then just compare each element of the transformed array with the binary number encoded by b and count the matches:

>>> b = np.array([0, 1, 1])           # encodes '3'
>>> weights = 2 ** np.arange(b.size)  # == [1, 2, 4, 8, ..., 2**(b.size-1)]

>>> np.convolve(a, weights, mode='valid')
array([4, 1, 3, 7, 6, 5, 3, 7, 7, 6, 4, 1])
#            ^           ^                  Matches

>>> target = (b * np.flip(weights)).sum()  # target==3
>>> np.count_nonzero(np.convolve(a, weights, mode='valid') == target)
2

CodePudding user response：

Here is a solution using a list comprehension:

a = [1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1]
b = [1, 1, 1]
sum(a[i:i len(b)]==b for i in range(len(a)-len(b)))

output: 3

CodePudding user response：

Not a super fast method, but you can view a as a windowed array using np.lib.stride_tricks.sliding_window_view:

window = np.lib.stride_tricks.sliding_window_view(a, b.shape)

You can now equate this to b directly and find where they match:

result = (window == b).all(-1).sum()

For older versions of numpy (pre-1.20.0), you can use np.libs.stride_tricks.as_strided to achieve a similar result:

window = np.lib.stride_tricks.as_strided(
    a, shape=(*(np.array(a.shape) - b.shape   1), *b.shape),
    strides=a.strides   (a.strides[0],) * b.ndim)

CodePudding user response：

Here are a few improvements on @Brian's answer:

Use np.correlate not np.convolve; they are nearly identical but convolve reads a and b in opposite directions
To deal with templates that have zeros convert the zeros to -1. For example:

a = np.array([1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1])
b = np.array([0,1,1])
np.correlate(a,2*b-1)
# array([-1,  1,  2,  1,  0,  0,  2,  1,  1,  0, -1,  1])

The template fits where the correlation equals the number of ones in the template. The indices can be extracted like so:

(np.correlate(a,2*b-1)==np.count_nonzero(b)).nonzero()[0]
# array([2, 6])

If you only need the count use np.count_nonzero

np.count_nonzero((np.correlate(a,2*b-1)==np.count_nonzero(b)))
# 2