Given an array consisting of non-negative numbers, I want to be able to find the index of the middle 0 for each of the consecutive 0's in the array, excluding 0's that are found at the start or the end of the array. A simple example: we have given array
np.array([0, 0, 0, 1, 2, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 5, 6, 0, 0, 0])
Then, I want to return indices 6 and 13 (for an even amount of consecutive 0's, I want either the floor or ceil).
Now, my current implementation is as follows:
def middle_zero(array):
busy = False
not_start_zero = False
middle_zero_list = []
for i in range(len(array)):
if array[i] > 0:
not_start_zero = True
if array[i] == 0 and not_start_zero and not busy:
start = i
busy = True
if busy and array[i] > 0:
end = i
middle_zero_list.append(int(np.mean([start, end])))
busy = False
return np.array(middle_zero_list)
middle_zero(np.array([0, 0, 0, 1, 2, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 5, 6, 0, 0, 0]))
>> array([6, 13])
Although it is working, I think that a much more clever solution is possible here. As I have to do this computation many times, I want it to be more efficient.
CodePudding user response:
Simply wrapping your function in numba.njit
(replacing the np.mean
with a direct computation) gets this a nice speedup:
[nav] In [8]: @numba.njit
...: def middle_zero2(array):
...: busy = False
...: not_start_zero = False
...:
...: middle_zero_list = []
...:
...: for i in range(len(array)):
...: if array[i] > 0:
...: not_start_zero = True
...:
...: if array[i] == 0 and not_start_zero and not busy:
...: start = i
...: busy = True
...: if busy and array[i] > 0:
...: end = i
...: middle_zero_list.append(int((start end)/2))
...: busy = False
...:
...: return np.array(middle_zero_list)
[ins] In [10]: %timeit middle_zero(a)
24.3 µs ± 466 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# skipped a cache run
[ins] In [12]: %timeit middle_zero2(a)
763 ns ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
about a factor of 30 of a speedup on your small example test case.
CodePudding user response:
Note this doesn't include solitary non-positives, ie middle_zero(np.array([1,0,2]))
gives array([], dtype=float64)
If you want a numpy only solution, you could do this (which shows significant speed improvements if the input array is large though it's slower than @DominikStańczak answer using numba (actually it depends on input - see below) - you can also use numba with this though it doesn't speed it up much):
import numpy as np
def middle_zero(array):
argwhere = np.argwhere(array > 0).ravel()
diff = np.ediff1d(argwhere)
result = argwhere[:-1] diff//2 # Or argwhere[:-1] np.ceil(diff/2) for consistency with OP
return result[result != argwhere[:-1]] #Or result[result-1 != argwhere[:-1]] for consistency with OP
Output with given example:
array([ 6, 12])
Edit timings:
with input as np.array([0, 0, 0, 1, 2, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 5, 6, 0, 0, 0])
25.1 µs ± 227 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) #OP
11.8 µs ± 30.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) #Nin17
1.18 µs ± 3.97 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) #Dominik
With input as np.random.default_rng().integers(0, 2, size=100000)
225 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) #OP
833 µs ± 3.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Nin17
481 µs ± 2.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Dominik
If you have long regions of consecutive non-positive numbers, this is actually the fastest, with an input of np.array([1,*[0]*1000,1]*1000)
:
539 ms ± 1.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) #OP
432 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Nin17
1.93 ms ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Dominik
CodePudding user response:
I don't know if this is faster, but should be metered. It should be faster on a compiled language, but python is not.
def middle_zero(array):
busy = False
not_start_zero = False
start=None
end=None
middle_zero_list = []
for i, value in enumerate(array):
isZero = value == 0
not_start_zero = [True, not_start_zero][isZero]
start, busy = [[start, busy], [i, True]][isZero and not_start_zero and not busy]
if busy and not isZero:
end = i
middle_zero_list.append(int(np.mean([start, end])))
busy = False
return np.array(middle_zero_list)