How would I use Python and numpy to identify when there are x number of neighboring values in an arr-CodePudding

I have a numpy array that has about 40,000 values in it. I need to search it to see if there are any groups of neighboring values in this array that are identical.

As an example, let's say I have this array:

array_1 = [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,3,4,1,1,1,5,6,7,3,2,5,1,1]

I would like my code to identify the chunk of ones at the beginning of the array and return an index array that tells me the location of these ones. So, something like this:

index_array_1 = [0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

Even though there are other groups of ones in array_1, I would only want the index array to return as true if the number of ones in the group is higher than a certain number (in this case, 14 or more). Because the other groups of neighboring ones in array_1 only consiste of 3 or less, they do not need to be returned.

CodePudding user response：

# x is your input array
n = x.shape[0]

loc_run_start = np.empty(n, dtype=bool)
loc_run_start[0] = True
np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
run_starts = np.nonzero(loc_run_start)[0]

# find run values
run_values = x[loc_run_start]

# find run lengths
run_lengths = np.diff(np.append(run_starts, n))

print(run_values)  # All the values in the array that will be evaluated
# [0, 1, 2, 3, 4, 1, 5, 6, 7, 3, 2, 5, 1]

print(run_starts)  # Start of every run of equal item
# [0, 1, 15, 16, 17, 18, 21, 22, 23, 24, 25, 26, 27]

print(run_lengths) # Lenght of every run
# [1, 14, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 2]

From there you can just check if any run lengths meet your criteria and obtain what you need.

Credit to author for the neat piece of code.

CodePudding user response：

Another solution with itertools.groupby:

from itertools import groupby

out = []
for _, g in groupby(array_1):
    tmp = sum(1 for _ in g)
    out.extend([tmp >= 14] * tmp)

print(out)

Prints:

[False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False]

If you want 0/1 just cast to int():

out = []
for _, g in groupby(array_1):
    tmp = sum(1 for _ in g)
    out.extend([int(tmp >= 14)] * tmp)

Prints:

[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

CodePudding user response：

We could use diff flatnonzero to get start and end points of consecutive numbers. Then use it create a boolean mask using numpy broadcasting:

num = 14
len_arr = len(array_1)
cutoffs = np.flatnonzero(np.diff(array_1)!=0) 1
start = np.r_[0, cutoffs]
end = np.r_[cutoffs, len_arr]
s, e = np.c_[start, end][end-start >= num].T.reshape(2,-1,1)
out = ((np.arange(len_arr) >= s) & (np.arange(len_arr) < e)).sum(axis=0)

Output:

array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 0])