Python comparing array to zero faster than np.any(array)-CodePudding

I want to test whether all elements of an array are zero. According to the StackOverflow posts Test if numpy array contains only zeros and https://stackoverflow.com/a/72976775/5269892, compared to (array == 0).all(), not array.any() should be the both most memory-efficient and fastest method.

I tested the performance with a random-number floating array, see below. Somehow though, at least for the given array size, not array.any() and even casting the array to boolean type appear to be slower than (array == 0).all(). How comes?

np.random.seed(100)
a = np.random.rand(10418*144)

%timeit (a == 0)
%timeit (a == 0).all()
%timeit a.astype(bool)
%timeit a.any()
%timeit not a.any()

# 711 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 740 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.69 ms ± 587 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.71 ms ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.71 ms ± 2.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

CodePudding user response：

The problem is due to the first two operations being vectorized using SIMD instructions while the three last are not. More specifically, the three last calls do an implicit conversion to bool (_aligned_contig_cast_double_to_bool) which is not yet vectorized. This is a known issue and I have already proposed a pull request for this (which revealed some unexpected issues due to undefined behaviors now fixed). If everything is fine, it should be available in the next major release of Numpy.

Note that a.any() and not a.any() implicitly perform a cast to an array of boolean so to then perform the any operation faster. This is not very efficient, but this is done that way so to reduce the number of generated function variants (Numpy is written in C and so a different implementation has to be generated for each type and optimizing many variants is hard so we prefer so perform implicit casts here, not to mention that this also reduce the size of the generated binaries). If this is not enough, not you can use Cython so to generate a faster specific optimized code.