Accessing Numpy Array Using Arrays: MATLAB to Python-CodePudding

I'm trying to replicate the following code from MATLAB in Python:

X = [1,2,3;
    4,5,6];
idx = [2,1];
idy = [3,2];

x = X(idx,idy)

Output:

x = 
    6  5
    3  2

I wish it was as simple as:

X = np.array([[1,2,3],[4,5,6]])
idx = np.array([1,0])
idy = np.array([2,1])

x = X[idx,idy]

but that is not the case.

What am I missing?

Second Example:

X = [0.677888793318259  0.141577665623206   0.175278629658347   0.00525491140018830
0.684907327765215   0.137981388019339   0.172732191381956   0.00437909283349025
0.691925862212172   0.134385110415471   0.170185753105565   0.00350327426679220
0.698944396659129   0.130788832811603   0.167639314829174   0.00262745570009415
0.705962931106086   0.127192555207735   0.165092876552782   0.00175163713339610
0.712981465553043   0.123596277603868   0.162546438276391   0.000875818566698050
0.720000000000000   0.120000000000000   0.160000000000000   0]

idx = [1 1 1 1 1 1 0]
idy = [1 1 1 1]

Correct Output:
x = [0.677888793318259  0.141577665623206   0.175278629658347   0.00525491140018830
0.684907327765215   0.137981388019339   0.172732191381956   0.00437909283349025
0.691925862212172   0.134385110415471   0.170185753105565   0.00350327426679220
0.698944396659129   0.130788832811603   0.167639314829174   0.00262745570009415
0.705962931106086   0.127192555207735   0.165092876552782   0.00175163713339610
0.712981465553043   0.123596277603868   0.162546438276391   0.000875818566698050]

CodePudding user response：

MATLAB automatically indexes the cross product.

The numpy equivalent is np.ix_:

Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].

>>> X[np.ix_(idx, idy)]
# array([[6, 5],
#        [3, 2]])

Or as Michael commented, you can index each dimension explicitly:

>>> X[idx][:, idy]
# array([[6, 5],
#        [3, 2]])

CodePudding user response：

Integer Indexing

In numpy, indexing with a list of integers is called advanced, or fancy indexing. the shape of the output is the shape of the broadcasted input indices. This is more versatile than the Matlab notation, which forces broadcasting whether you ask for it or not. The matlab equivalent of non-broadcasted indexing is using linear indices, which is much more awkward. To get two index arrays to make a grid, they just need to have their non-unit dimension along the appropriate axis:

X = np.array([
    [1, 2, 3],
    [4, 5, 6]])
idx = np.array([1, 0])
idy = np.array([2, 1])
X[idx[:, None], idy]

The function np.ix_ wraps the reshape to an arbitrary number of axes:

X[np.ix_[idx, idy]]

Boolean Indexing

Your second example uses logical masks, or what in numpy is called boolean indexing, or masking. To tell numpy that you want to use a boolean mask, you have to pass in a boolean array. Using a list of python bools is one way to do it. Another is to explicitly set dtype=bool in the index:

idx = np.array([True, True])
idy = np.array([1, 0, 1], dtype=bool)
X[np.ix_(idx, idy)]

Notice that for this application, np.ix_ converts the indices into integers:

>>> np.ix_(idx, idy)
(array([[0],
        [1]]),
 array([[0, 1, 2]]))

You can write the equivalent expression using np.flatnonzero and the reshape technique used for the first part:

X[np.flatnonzero(idx)[:, None], np.flatnonzero(idy)]

Numpy boolean masks can contain any combination of True and False values, as long as the mask has the same shape as the leading dimensions of the array that it is applied to. That means that you can construct the output mask as idx[:, None] & idy to get the shape you need.

>>> X[idx[:, None] & idy]
array([1, 3, 4, 6])

Because masks can contain any number of elements, the result of indexing this way will be a flat array, unlike the result of integer indexing. This part of the reason that ix_ converts to integers first. You can get around this by counting the number of non-zero elements in each of the axis masks, e.g. with np.count_nonzero, or np.sum:

>>> X[idx[:, None] & idy].reshape(np.count_nonzero(idx), np.sum(idy))
array([[1, 3],
       [4, 6]])