Suppose I have some array like
a = np.random.random(100) > 0.5
array([ True, True, False, False, True, False, False, True, True,
True, False, False, True, False, True, False, True, True,
False, False, False, False, False, True,...
I want to find the start indices for all sections of neighbouring True
s of a minimum of X
. So for X=3
in the random snippet above I would want 7
. For X=2
I should get 0,7,16
.
I can do this with loops but wondering if anyone can tell me a smarter way?
CodePudding user response:
import numpy as np
from scipy.signal import find_peaks
a = np.array([True, True, False, False, True, False, False, True, True,
True, False, False, True, False, True, False, True, True,
False, False, False, False, False, True])
_, peaks = find_peaks(np.concatenate([np.zeros(1), a]), width=3)
result = peaks["left_bases"]
print(result)
Output
[7]
For width=2
, you have:
_, peaks = find_peaks(np.concatenate([np.zeros(1), a]), width=2)
result = peaks["left_bases"]
print(result)
Output
[ 0 7 16]
CodePudding user response:
you can use a convolution :
convolution = np.convolve(a, np.array([1, 1, 1]))
np.where(convolution == 3)[0] - 2
here the convultion [1, 1, 1] will sum the number with the number before and after it. Then you can find all the indices where 3 is reached and substract 2
here is the generalisation with any number of consecutives
def find_consecutive_sequences(number_of_consecutive, a)
convolution = np.convolve(a, np.ones(shape=(number_of_consecutive)))
return np.where(convolution == number_of_consecutive)[0] - (number_of_consecutive - 1 )
print(find_consecutive_sequences(3, a))
print(find_consecutive_sequences(4, a))
print(find_consecutive_sequences(5, a))
which gives
[ 7 16 17 18]
[16 17]
[16]
for a (slightly modified to to test the 4
and 5
case) being
a = np.array([ True, True, False, False, True, False, False, True, True,
True, False, False, True, False, True, False, True, True,
True, True, False, False])
CodePudding user response:
You can find consecutive True
s by finding the cumulative sum of the boolean array and then splitting that cumsum array into subarrays of consecutive numbers and extracting the starting points of subarrays that are of length X
.
def starting_point_of_X_consecutive_Trues(arr, X):
arr_cumsum = arr.cumsum()
splits = np.split(arr_cumsum, np.where(np.diff(arr_cumsum) != 1)[0] 1)
relevant_points = [splits[0][0]] if len(splits[0]) >= X else []
relevant_points = [split[1] for split in splits[1:] if len(split)-1 >= X]
return np.isin(arr_cumsum, relevant_points).nonzero()[0]
Output:
starting_point_of_X_consecutive_Trues(a, 3) # [7]
starting_point_of_X_consecutive_Trues(a, 2) # [0,7,16]