I want to extract parts of an numpy ndarray based on arrays of index positions for some of the dimensions. Let me show this on an example

Example data

dummy = np.random.rand(5,2,100)
X = np.array([[0,1],[4,1],[2,0]])

dummy is the original ndarray with dimensionality 5x2x100. This dimensionality is arbitrary, it could as well be 5x2x4x100.
X is a matrix of index values, here X[:,0] are the indices of the first dimension of dummy, X[:,1] those of the second dimension. The number of columns in X is always the number of dimensions in dummy minus 1.

Example output

I want to extract an ndarray of the following form for this example

[
  dummy[0,1,:],
  dummy[4,1,:],
  dummy[2,0,:]
]

Complications

If the number of dimensions in dummy were fixed, this could just be done by dummy[X[:,0],X[:,1],:] . Sadly the dimensionality can be different, e.g. dummy could be a 5x2x4x6x100 ndarray and X correspondingly would then be 3x4 . My attempts at dealing with it have not yielded the desired result.

dummy[X,:] yields a 3x2x2x100 ndarray for this example same as dummy[X]
Iteratively reducing dummy by doing something like dummy = dummy[X[:,i],:] with i an iterator over the number of columns of X also does not reduce the ndarray in the example past 3x2x100

I have a feeling that this should be pretty simple with numpy indexing, but I guess my search for a solution was missing the right terms for this.
Does anyone have a solution to this?

CodePudding user response：

I will try to provide some explainability to @Michael Szczesny answer.

First, notice that if you have an np.array with dimension n and pass m indexes where m<n, then it will be the same as using : in the dimensions >=m. In your case, for example:

dummy[(0, 0)] == dummy[0, 0, :]

Given that, note that you can also pass an array as an index. Thus:

dummy[([0, 1], [0, 0])]

It would be the same as:

np.array([dummy[(0,0)], dummy[(1,0)]])

You can validate that using:

dummy[([0, 1], [0, 0])] == np.array([dummy[(0,0)], dummy[(1,0)]])

Finally, notice that:

(*X.T,)
# (array([0, 4, 2]), array([1, 1, 0]))

You are here getting each dimension as an array, and then you will get:

[
  dummy[0,1],
  dummy[4,1],
  dummy[2,0]
]

Which is the same as:

[
  dummy[0,1,:],
  dummy[4,1,:],
  dummy[2,0,:]
]

Edit: Instead of using (*X.T,), you can use tuple(X.T), which for me, makes more sense

CodePudding user response：

as Michael Szczesny wrote, the best solution is dummy[(*X.T,)].

Since X[:,0] are the indices of the first dimension of dummy and X[:,1] are the indices of the second dimension of dummy, if you transpose X (X.T) you'll have the the indices of the first dimension of dummy as X.T[0] and the indices of the second dimension of dummy as X.T[1].

Now to slice dummy as you want, you can specify the indices of the first and of the second dimension in this way:

dummy[(first_dim_indices, second_dim_indices)] = dummy[(X.T[0], X.T[1])]

In order to simplify the code (and since you doesn't want to transpose the X matrix twice) you can unpack X.T in a tuple as (*X.T,) and so write X[(*X.T,)] is the same thing to write dummy[(X.T[0], X.T[1])].

This writing is also useful if you have an unfixed number of dimensions to slice trough because you will unpack from X.T as many lines as there are dimensions to slice in dummy. For example suppose you want to retrieve an 1D-array from dummy given the following indices:

first_dim:  (0, 4, 2)
second_dim: (1, 1, 0)
third_dim:  (9, 8, 7)

You can specify the indices of the 3 dimensions as X = np.array([[0,1,9],[4,1,8],[2,0,7]]) and dim[(*X.T,)] is still valid.