How can I extract indices from a numpy array where the size of a contiguous matching section is larg-CodePudding

Suppose I have some array like

a = np.random.random(100) > 0.5

array([ True,  True, False, False,  True, False, False,  True,  True,
        True, False, False,  True, False,  True, False,  True,  True,
       False, False, False, False, False,  True,...

I want to find the start indices for all sections of neighbouring Trues of a minimum of X. So for X=3 in the random snippet above I would want 7. For X=2 I should get 0,7,16.

I can do this with loops but wondering if anyone can tell me a smarter way?

CodePudding user response：

Try scipy.signal.find_peaks

import numpy as np
from scipy.signal import find_peaks

a = np.array([True, True, False, False, True, False, False, True, True,
              True, False, False, True, False, True, False, True, True,
              False, False, False, False, False, True])

_, peaks = find_peaks(np.concatenate([np.zeros(1), a]), width=3)
result = peaks["left_bases"]
print(result)

Output

[7]

For width=2, you have:

_, peaks = find_peaks(np.concatenate([np.zeros(1), a]), width=2)
result = peaks["left_bases"]
print(result)

Output

[ 0  7 16]

CodePudding user response：

you can use a convolution :

convolution = np.convolve(a, np.array([1, 1, 1]))
np.where(convolution == 3)[0] - 2

here the convultion [1, 1, 1] will sum the number with the number before and after it. Then you can find all the indices where 3 is reached and substract 2

here is the generalisation with any number of consecutives

def find_consecutive_sequences(number_of_consecutive, a)
    convolution = np.convolve(a, np.ones(shape=(number_of_consecutive)))
    return np.where(convolution == number_of_consecutive)[0] - (number_of_consecutive - 1 )

print(find_consecutive_sequences(3, a))
print(find_consecutive_sequences(4, a))
print(find_consecutive_sequences(5, a))

which gives

[ 7 16 17 18]
[16 17]
[16]

for a (slightly modified to to test the 4 and 5 case) being

a = np.array([ True,  True, False, False,  True, False, False,  True,  True,
        True, False, False,  True, False,  True, False,  True,  True,
       True, True, False, False])

CodePudding user response：

You can find consecutive Trues by finding the cumulative sum of the boolean array and then splitting that cumsum array into subarrays of consecutive numbers and extracting the starting points of subarrays that are of length X.

def starting_point_of_X_consecutive_Trues(arr, X):
    arr_cumsum = arr.cumsum()
    splits = np.split(arr_cumsum, np.where(np.diff(arr_cumsum) != 1)[0] 1)
    relevant_points = [splits[0][0]] if len(splits[0]) >= X else []
    relevant_points  = [split[1] for split in splits[1:] if len(split)-1 >= X]
    return np.isin(arr_cumsum, relevant_points).nonzero()[0]

Output:

starting_point_of_X_consecutive_Trues(a, 3) # [7]
starting_point_of_X_consecutive_Trues(a, 2) # [0,7,16]