In numpy, most computationally efficient way to find the array with shortest non-zero sequence in ar-CodePudding

Say that I have an array of arrays

import numpy as np 

z = np.array(
    [
     [1, 1, 0, 0, 0, 0],
     [1, 1, 1, 1, 1, 0],
     [1, 1, 1, 0, 0, 0],
     [1, 1, 1, 1, 1, 1],
    ]
)

Where 1s start on the left side of each array, and 0s on the right side if any. For many applications, this is how arrays are padded so that each array is of the same length in an array of arrays.

How would I get the shortest sequence of non-zeros for such an array.

In this case, the shortest sequence is the first array, which has a length of 2.

The obvious answer is to iterate over each array and find the index of the first zero, but I feel that there's probably a method that takes more advantage of numpy's c processing.

CodePudding user response：

Use np.count_nonzero np.min:

res = np.count_nonzero(z, axis=1).min()
print(res)

Output

The function count_nonzero returns an array like:

[2 5 3 6]

then simply find the minimum value.

If you want the index of the row, use np.argmin instead.

CodePudding user response：

If you want to know which sub-array has the least zeros then you could use:

z_sums = z.sum(axis = 1)
z_least = np.argmin(z_sums)

CodePudding user response：

Benchmark with a 5000×5000 array:

 74.3 ms  Dani
 34.2 ms  user19077881
  3.4 ms  Kelly

It's a simple simple O(m n) saddleback search from top-right to bottom-left:

def Kelly(z):
    m, n = z.shape
    j = n - 1
    for i in range(m):
        while not z[i][j]:
            j -= 1
            if j < 0:
                return 0
    return j   1

Full benchmark code (Try it online!):

import numpy as np
from timeit import timeit
import random

m, n = 5000, 5000

def genz():
    lo = random.randrange(n*5//100, n//3)
    return np.array(
        [
            [1]*ones   [0]*(n-ones)
            for ones in random.choices(range(lo, n 1), k=m)
        ]
    )

def Dani(z):
    return np.count_nonzero(z, axis=1).min()

def user19077881(z):
    z_sums = z.sum(axis = 1)
    z_least = np.argmin(z_sums)
    return z_least

def Kelly(z):
    m, n = z.shape
    j = n - 1
    for i in range(m):
        while not z[i][j]:
            j -= 1
            if j < 0:
                return 0
    return j   1

funcs = Dani, user19077881, Kelly

for _ in range(3):
    z = genz()
    for f in funcs:
        t = timeit(lambda: f(z), number=1)
        print('%5.1f ms ' % (t * 1e3), f.__name__)
    print()