Say that I have an array of arrays
import numpy as np
z = np.array(
[
[1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
]
)
Where 1s start on the left side of each array, and 0s on the right side if any. For many applications, this is how arrays are padded so that each array is of the same length in an array of arrays.
How would I get the shortest sequence of non-zeros for such an array.
In this case, the shortest sequence is the first array, which has a length of 2.
The obvious answer is to iterate over each array and find the index of the first zero, but I feel that there's probably a method that takes more advantage of numpy's c processing.
CodePudding user response:
Use np.count_nonzero
np.min
:
res = np.count_nonzero(z, axis=1).min()
print(res)
Output
2
The function count_nonzero
returns an array like:
[2 5 3 6]
then simply find the minimum value.
If you want the index of the row, use np.argmin
instead.
CodePudding user response:
If you want to know which sub-array has the least zeros then you could use:
z_sums = z.sum(axis = 1)
z_least = np.argmin(z_sums)
CodePudding user response:
Benchmark with a 5000×5000 array:
74.3 ms Dani
34.2 ms user19077881
3.4 ms Kelly
It's a simple simple O(m n) saddleback search from top-right to bottom-left:
def Kelly(z):
m, n = z.shape
j = n - 1
for i in range(m):
while not z[i][j]:
j -= 1
if j < 0:
return 0
return j 1
Full benchmark code (Try it online!):
import numpy as np
from timeit import timeit
import random
m, n = 5000, 5000
def genz():
lo = random.randrange(n*5//100, n//3)
return np.array(
[
[1]*ones [0]*(n-ones)
for ones in random.choices(range(lo, n 1), k=m)
]
)
def Dani(z):
return np.count_nonzero(z, axis=1).min()
def user19077881(z):
z_sums = z.sum(axis = 1)
z_least = np.argmin(z_sums)
return z_least
def Kelly(z):
m, n = z.shape
j = n - 1
for i in range(m):
while not z[i][j]:
j -= 1
if j < 0:
return 0
return j 1
funcs = Dani, user19077881, Kelly
for _ in range(3):
z = genz()
for f in funcs:
t = timeit(lambda: f(z), number=1)
print('%5.1f ms ' % (t * 1e3), f.__name__)
print()