When I use ":n" or "m:" as arguments to np.r_, I get unexpected results that I don't understand.
Here's my code
import numpy as np
B = np.arange(180).reshape(6,30)
C = B[:, np.r_[10:15, 20:26]]
D = C[:, np.r_[0:3,8:11]]
Now all of that worked as expected. C prints as:
array([[ 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 25],
[ 40, 41, 42, 43, 44, 50, 51, 52, 53, 54, 55],
[ 70, 71, 72, 73, 74, 80, 81, 82, 83, 84, 85],
[100, 101, 102, 103, 104, 110, 111, 112, 113, 114, 115],
[130, 131, 132, 133, 134, 140, 141, 142, 143, 144, 145],
[160, 161, 162, 163, 164, 170, 171, 172, 173, 174, 175]])
and D is:
array([[ 10, 11, 12, 23, 24, 25],
[ 40, 41, 42, 53, 54, 55],
[ 70, 71, 72, 83, 84, 85],
[100, 101, 102, 113, 114, 115],
[130, 131, 132, 143, 144, 145],
[160, 161, 162, 173, 174, 175]])
However, when I remove the "0" and the "11," I don't understand what happens, and I haven't been able to find any explanation in any Numpy indexing or r_ documentation. Here's the new line of code:
E = C[:, np.r_[:3, 8:]]
It's just the same expression that defined the D array with "unnecessary" indices removed. However, the results are mystifying:
array([[ 10, 11, 12, 10, 11, 12, 13, 14, 20, 21, 22],
[ 40, 41, 42, 40, 41, 42, 43, 44, 50, 51, 52],
[ 70, 71, 72, 70, 71, 72, 73, 74, 80, 81, 82],
[100, 101, 102, 100, 101, 102, 103, 104, 110, 111, 112],
[130, 131, 132, 130, 131, 132, 133, 134, 140, 141, 142],
[160, 161, 162, 160, 161, 162, 163, 164, 170, 171, 172]])
I expected E to be identical to D, with just six columns. What's going on? Is this behavior documented somewhere, or is this a bug?
CodePudding user response:
The answer is that Numpy indexing does not work like Python indexing. For some reason, it is different, and one has to know what the last index is to get the items from n to last and use [n:last]
instead of [n:]
. IMHO, this defeats one of the better features of Python, not having to call some sort of shape or size function to get your indices correct.
CodePudding user response:
To understand the difference between D
and E
we have to look what the np.r_
produces. As with function calls, the 'contents' of an indexing, if complex, are evaluated first.
In [112]: D = C[:, np.r_[0:3,8:11]]; D.shape
Out[112]: (6, 6)
In [113]: E = C[:, np.r_[:3,8:]]; E.shape
Out[113]: (6, 11)
The two r_
:
In [115]: np.r_[0:3,8:11]
Out[115]: array([ 0, 1, 2, 8, 9, 10])
In [116]: np.r_[:3,8:]
Out[116]: array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7])
r_
is an instance of a class defined in np.lib.index_tricks
. That class has its own __getitem__
method, allowing us to use indexing notation, but the task is actually a call to np.concatenate
.
We can see what r_
get by using another index_tricks
:
In [117]: np.s_[0:3, 8:11]
Out[117]: (slice(0, 3, None), slice(8, 11, None))
In [118]: np.s_[:3, 8:]
Out[118]: (slice(None, 3, None), slice(8, None, None))
If we define a simple function:
def foo(aslice):
return np.arange(aslice.start, aslice.stop, aslice.step)
we can test the different slices:
In [124]: foo(np.s_[8:11]) # np.arange(8,11)
Out[124]: array([ 8, 9, 10])
In [125]: foo(np.s_[8:]) # np.arange(8)
Out[125]: array([0, 1, 2, 3, 4, 5, 6, 7])
Remember, that when we give arange
just one number, it's understood to be the 'stop', with an implicit 0 start. That's the same as with python's base range
.
np.r_
actually uses:
In [105]: def foo1(item):
...: step = item.step
...: start = item.start
...: stop = item.stop
...: if start is None:
...: start = 0
...: if step is None:
...: step = 1
...: return np.arange(start, stop, step)
but this just lets us use np.r_[:3]
instead of np.r_[0:3]
. It doesn't change the [8:]
case.
In case it isn't clear. A[i,j]
is translated by the interpreter into A.__getitem__((i,j))
, a function call. The interpreter also converts any '::' into a slice(...)
object, as illustrated by s_
.
After converting the slices into arrays with np.arange
or np.linspace
(for 'complex' steps), it does a concatenate
So your two r_
expressions are really:
In [128]: np.concatenate([np.arange(0,3), np.arange(8,11)]) # [115]
Out[128]: array([ 0, 1, 2, 8, 9, 10])
In [129]: np.concatenate([np.arange(0,3), np.arange(8,None)]) # [116]
Out[129]: array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7])
I suppose one could argue that np.r_[8:]
should raise an error, since it provides a start
without stop
, and thus can't be evaluated as it would in a real indexing case. As coded it works because of the default behavior of np.arange
.
edit
When I use '8:' directly, C
can deduce the correct stop
from its own shape
:
In [140]: C.shape
Out[140]: (6, 11)
In [141]: C[:,8:].shape
Out[141]: (6, 3)
But an np.r_
object does not have a shape
, nor can it deduce the shape from C
:
In [142]: np.r_.shape
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [142], in <cell line: 1>()
----> 1 np.r_.shape
AttributeError: 'RClass' object has no attribute 'shape'
If you want to avoid the explicit 11
, you have use:
In [143]: C[:, np.r_[8:C.shape[1]]].shape
Out[143]: (6, 3)