Broadcast comparison on sliced numpy array using "," is a lot slower than "]["-CodePudding

I'm not sure why comparing on a sliced numpy array using , is a lot slower than ][. For example:

start = time.time()
a = np.zeros((100,100))
for _ in range(1000000):
    a[1:99][1:99] == 1
print(time.time() - start)

start = time.time()
a = np.zeros((100,100))
for _ in range(1000000):
    a[1:99, 1:99] == 1
print(time.time() - start)

3.2756259441375732
11.044903039932251

That's over 3 times worse. The time measurements are approximately the same using timeit.

I'm working on a recursive algorithm (I intended to do so), and those problems make my program run a lot slower, from about 1 second increased to 10 seconds. I just want to know the reason behind them. May be this is a bug. I'm using Python 3.9.9. Thanks.

CodePudding user response：

The first is the same as a[2:99]==1. A (98,100) slice followed by a (97,100), and then the == test.

In [177]: timeit (a[1:99][1:99]==1)
8.51 µs ± 16.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [178]: timeit (a[1:99][1:99])
383 ns ± 5.73 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [179]: timeit (a[1:99])
208 ns ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

The bulk of the time is the test, not the slicing.

In [180]: a[1:99,1:99].shape
Out[180]: (98, 98)
In [181]: timeit a[1:99,1:99]==1
32.2 µs ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [182]: timeit a[1:99,1:99]
301 ns ± 3.61 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Again the slicing is a minor part of the timing, but the == test is significantly slower. In the first case we selected a subset of the rows, so the test is on a contiguous block of the data-buffer. In the second we select a subset of rows and columns. Iteration through the data-buffer is more complicated.

We can simplify the comparison by testing a slice of columns versus a slice of rows:

In [183]: timeit a[:,2:99]==1
32.3 µs ± 13.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [184]: timeit a[2:99,:]==1
8.58 µs ± 10.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

As a further test, make a new array with 'F' order. Now "rows" are the slow slice

In [189]: b = np.array(a, order='F')
In [190]: timeit b[:,2:99]==1
8.83 µs ± 20.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [191]: timeit b[2:99,:]==1
32.8 µs ± 31.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

===

But why are you trying to compare these two slices, one that makes a (97,100) array, and the other a (98,98). They are picking different parts of a.

I wonder if you really meant to test a sequential row, column slice, not two row slices.

In [193]: timeit (a[1:99][:,1:99]==1)
32.6 µs ± 92.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Comparing just the slicing we see that the sequential one is slower - by just a bit.

In [194]: timeit (a[1:99][:,1:99])
472 ns ± 3.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [195]: timeit (a[1:99,1:99])
306 ns ± 3.19 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)