Home > front end >  Getting unexpected shape while slicing a numpy array
Getting unexpected shape while slicing a numpy array

Time:05-28

I have a 4D numpy array. While slicing for multiple indices in a single dimension, my axis get interchanged. Am I missing something trivial here.

import numpy as np
from smartprint import smartprint as prints

a = np.random.rand(50, 60, 70, 80)
b = a[:, :, :, [2,3,4]]
prints (b.shape)  # this works as expected 

c = a[1, :, :, [2,3,4]]
prints (c.shape)  # here, I see the axes are interchanged

Output:

b.shape : (50, 60, 70, 3)
c.shape : (3, 60, 70)

CodePudding user response:

Here are some observations that may help explain the problem.

Start with a 3d array, with the expect strides:

In [158]: x=np.arange(24).reshape(2,3,4)
In [159]: x.shape,x.strides
Out[159]: ((2, 3, 4), (48, 16, 4))

Advanced indexing on the last axis:

In [160]: y=x[:,:,[0,1,2,3]]    
In [161]: y.shape, y.strides
Out[161]: ((2, 3, 4), (12, 4, 24))

Notice that the strides are not in the normal C-contiguous order. For a 2d array we'd describe this a F-contiguous. It's an obscure indexing detail that usually doesn't matter.

Apparently when doing this indexing it first makes an array with the last, the indexed dimension, first:

In [162]: y.base.shape
Out[162]: (4, 2, 3)
In [163]: y.base.strides
Out[163]: (24, 12, 4)

y is this base with swapped axes, a view of its base.

The case with a slice in the middle is

In [164]: z=x[1,:,[0,1,2,3]]
In [165]: z.shape, z.strides
Out[165]: ((4, 3), (12, 4))

In [166]: z.base    # its own base, not a view

Transposing z to the expected (3,4) shape would switch the strides to (4,12), F-contiguous.

With the two step indexing, we get an array with the expect shape, but the F strides. And its base looks a lot like z.

In [167]: w=x[1][:,[0,1,2,3]]

In [168]: w.shape, w.strides
Out[168]: ((3, 4), (4, 12))

In [169]: w.base.shape, w.base.strides
Out[169]: ((4, 3), (12, 4))

The docs justify the switch in axes by saying that there's an ambiguity when performing advanced indexing with a slice in the middle. It's perhaps clearest when using a (2,1) and (4,) indices:

In [171]: w=x[[[0],[1]],:,[0,1,2,3]]

In [172]: w.shape, w.strides
Out[172]: ((2, 4, 3), (48, 12, 4))

The middle, size 3 dimension, is "tacked on last". With x[1,:,[0,1,2,3]] that ambibuity argument isn't as good, but apparently it's using the same indexing method. When this was raised in github issues, the claim was that reworking the indexing to correct this was too difficult. Individual cases might be corrected, but a comprehensive change was too complicated.

This dimension switch seems to come up on SO a couple of times a year, an annoyance, but not a critical issue.

  • Related