I want to extract parts of an numpy ndarray based on arrays of index positions for some of the dimensions. Let me show this on an example
Example data
dummy = np.random.rand(5,2,100)
X = np.array([[0,1],[4,1],[2,0]])
dummy
is the original ndarray with dimensionality 5x2x100. This dimensionality is arbitrary, it could as well be 5x2x4x100.
X
is a matrix of index values, here X[:,0]
are the indices of the first dimension of dummy
, X[:,1]
those of the second dimension. The number of columns in X is always the number of dimensions in dummy
minus 1.
Example output
I want to extract an ndarray of the following form for this example
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Complications
If the number of dimensions in dummy
were fixed, this could just be done by dummy[X[:,0],X[:,1],:]
. Sadly the dimensionality can be different, e.g. dummy
could be a 5x2x4x6x100 ndarray and X
correspondingly would then be 3x4 . My attempts at dealing with it have not yielded the desired result.
dummy[X,:]
yields a 3x2x2x100 ndarray for this example same asdummy[X]
- Iteratively reducing
dummy
by doing something likedummy = dummy[X[:,i],:]
withi
an iterator over the number of columns ofX
also does not reduce the ndarray in the example past 3x2x100
I have a feeling that this should be pretty simple with numpy indexing, but I guess my search for a solution was missing the right terms for this.
Does anyone have a solution to this?
CodePudding user response:
I will try to provide some explainability to @Michael Szczesny answer.
First, notice that if you have an np.array
with dimension n
and pass m
indexes where m<n
, then it will be the same as using :
in the dimensions >=m
. In your case, for example:
dummy[(0, 0)] == dummy[0, 0, :]
Given that, note that you can also pass an array as an index. Thus:
dummy[([0, 1], [0, 0])]
It would be the same as:
np.array([dummy[(0,0)], dummy[(1,0)]])
You can validate that using:
dummy[([0, 1], [0, 0])] == np.array([dummy[(0,0)], dummy[(1,0)]])
Finally, notice that:
(*X.T,)
# (array([0, 4, 2]), array([1, 1, 0]))
You are here getting each dimension as an array, and then you will get:
[
dummy[0,1],
dummy[4,1],
dummy[2,0]
]
Which is the same as:
[
dummy[0,1,:],
dummy[4,1,:],
dummy[2,0,:]
]
Edit: Instead of using (*X.T,), you can use tuple(X.T), which for me, makes more sense
CodePudding user response:
as Michael Szczesny wrote, the best solution is dummy[(*X.T,)]
.
Since X[:,0]
are the indices of the first dimension of dummy
and X[:,1]
are the indices of the second dimension of dummy
, if you transpose X
(X.T
) you'll have the the indices of the first dimension of dummy
as X.T[0]
and the indices of the second dimension of dummy
as X.T[1]
.
Now to slice dummy
as you want, you can specify the indices of the first and of the second dimension in this way:
dummy[(first_dim_indices, second_dim_indices)] = dummy[(X.T[0], X.T[1])]
In order to simplify the code (and since you doesn't want to transpose the X
matrix twice) you can unpack X.T
in a tuple as (*X.T,)
and so write X[(*X.T,)]
is the same thing to write dummy[(X.T[0], X.T[1])]
.
This writing is also useful if you have an unfixed number of dimensions to slice trough because you will unpack from X.T
as many lines as there are dimensions to slice in dummy
. For example suppose you want to retrieve an 1D-array from dummy
given the following indices:
first_dim: (0, 4, 2)
second_dim: (1, 1, 0)
third_dim: (9, 8, 7)
You can specify the indices of the 3 dimensions as X = np.array([[0,1,9],[4,1,8],[2,0,7]])
and dim[(*X.T,)]
is still valid.