Array based indexing of an ndarray-CodePudding

I am not understanding numpy.take though it seems like it is the function I want. I have an ndarray and I want to use another ndarray to index into the first.

import numpy as np

# Create a matrix
A = np.arange(75).reshape((5,5,3))

# Create the index array
idx = np.array([[1, 0, 0, 1, 1],
                [1, 1, 0, 1, 1],
                [1, 0, 1, 0, 1],
                [1, 1, 0, 0, 0],
                [1, 1, 1, 1, 0]])

Given the above, I want to index A by the values in idx. I thought takedoes this, but it doesn't output what I expected.

# Index the 3rd dimension of the A matrix by the idx array.

Asub = np.take(A, idx)

print(f'Value in A at 1,1,1 is {A[1,1,1]}')
print(f'Desired index from idx {idx[1,1]}')

print(f'Value in Asub at [1,1,1] {Asub[1,1]} <- thought this would be 19')

I was expecting to see the value at the idx location one the value in A based on idx:

Value in A at 1,1,1 is 19
Desired index from idx 1
Value in Asub at [1,1,1] 1 <- thought this would be 19

CodePudding user response：

One possibility is to create row and col indices that broadcast with the third dimension one, i.e a (5,1) and (5,) that pair with the (5,5) idx:

In [132]: A[np.arange(5)[:,None],np.arange(5), idx]
Out[132]: 
array([[ 1,  3,  6, 10, 13],
       [16, 19, 21, 25, 28],
       [31, 33, 37, 39, 43],
       [46, 49, 51, 54, 57],
       [61, 64, 67, 70, 72]])

This ends up picking values from A[:,:,0] and A[:,:,1]. This takes the values of idx as integers, in the range of valid (0,1,2) (for shape 3). They aren't boolean selectors.

Out[132][1,1] is 19, same as A[1,1,1]; Out[132][1,2] is the same as A[1,2,0].

take_along_axis gets the same values, but with an added dimension:

In [142]: np.take_along_axis(A, idx[:,:,None], 2).shape
Out[142]: (5, 5, 1)

In [143]: np.take_along_axis(A, idx[:,:,None], 2)[:,:,0]
Out[143]: 
array([[ 1,  3,  6, 10, 13],
       [16, 19, 21, 25, 28],
       [31, 33, 37, 39, 43],
       [46, 49, 51, 54, 57],
       [61, 64, 67, 70, 72]])

The iterative equivalent might be easier to understand:

In [145]: np.array([[A[i,j,idx[i,j]] for j in range(5)] for i in range(5)])
Out[145]: 
array([[ 1,  3,  6, 10, 13],
       [16, 19, 21, 25, 28],
       [31, 33, 37, 39, 43],
       [46, 49, 51, 54, 57],
       [61, 64, 67, 70, 72]])

If you have trouble expressing an action in "vectorized" array ways, go ahead an write an integrative version. It will avoid a lot of ambiguity and misunderstanding.

Another way to get the same values, treating the idx values as True/False booleans is:

In [146]: np.where(idx, A[:,:,1], A[:,:,0])
Out[146]: 
array([[ 1,  3,  6, 10, 13],
       [16, 19, 21, 25, 28],
       [31, 33, 37, 39, 43],
       [46, 49, 51, 54, 57],
       [61, 64, 67, 70, 72]])

CodePudding user response：

IIUC, you can get the resulted array by broadcasting the idx array, to make its shape same as A to be multiplied, and then indexing to get the column 1 as:

Asub = (A * idx[:, :,  None])[:, :, 1]    # --> Asub[1, 1] = 19

# [[ 1  0  0 10 13]
#  [16 19  0 25 28]
#  [31  0 37  0 43]
#  [46 49  0  0  0]
#  [61 64 67 70  0]]

I think it be the fastest way (or one of the bests), particularly for large arrays.