Say I have a multi-dimensional array:
np.array([[1, 0, 0], [0, 0, 1]])
And I want to extract values given an additional list of indices:
np.array([0, 2])
Where the expected output is:
[1, 1]
What's the best way to approach this?
CodePudding user response:
Here,
>>> desired_cols = np.array([0, 2])
>>> desired_rows = np.arange(len(desired_cols))
>>> x[desired_rows, desired_cols]
array([1, 1])
Your np.array([0, 2])
doesn't provide enough information to index into a multi-dimensional array. Here, I'm assuming based on your example that those are the columns which you want to select. Since you also need to supply the corresponding rows you want to select, I've created an arange
based on the length of the desired columns.
Selecting single elements
In general, advanced indexing requires lists of indices for each axis:
x[[axis_0_idxs], [axis_1_idxs], ...]
Where if you were to zip(axis_0_idxs, axis_1_idxs, ...)
, you'd produce coordinate tuples. For example, with the indices used for your problem:
>>> list(zip(desired_rows, desired_columns))
[(0, 0), (1, 2)]
Selecting subarrays
If, however, you want to select ALL the values from the desired rows together with ALL the values from the desired columns, you can use np.ix_()
. Here's a more complex example:
>>> x = np.random.randint(0, 9, (5, 5), dtype="uint8")
>>> x
array([[87, 57, 64, 48, 15],
[72, 8, 0, 81, 63],
[63, 51, 66, 0, 68],
[77, 46, 74, 74, 86],
[51, 59, 48, 81, 75]], dtype=uint8)
Suppose we want the subarray corresponding to rows 1, 2, and 3, and columns 0, 2, and 4. By using basic lists to index into x
, we instead get an array of three items:
>>> rows = [1, 2, 3]
>>> cols = [0, 2, 4]
>>> x[rows, cols]
array([72, 66, 86], dtype=uint8)
This is because we're using 1D lists which, again, are essentially zipped into coordinate tuples. If we want to select the subarray made up of rows 1, 2, 3 and columns 0, 2, 4, we need to select all the columns for each of the rows. This, in a sense, is the cartesian product of the rows and columns, but because the cartesian product would still only produce a 1D sequence of coordinate tuples, we'd still only get a 1D output, even if we get the correct values.
But by using np.ix_()
, we get a grid of coordinates represented in a very compact way:
>>> np.ix_(rows, cols)
(array([[1],
[2],
[3]]),
array([[0, 2, 4]]))
Using this to index gets us the 3x3
subarray we wanted:
>>> x[np.ix_(rows, cols)]
array([[72, 0, 63],
[63, 66, 68],
[77, 74, 86]], dtype=uint8)
Here's some pure Python to demonstrate how indexing with an np.ix_
object behaves:
>>> all_rows = [[r]*len(cols) for r in rows]
>>> all_cols = [cols]*len(rows)
>>> all_rows
[[1, 1, 1], [2, 2, 2], [3, 3, 3]]
>>> all_cols
[[0, 2, 4], [0, 2, 4], [0, 2, 4]]
>>> x[all_rows, all_cols]
array([[72, 0, 63],
[63, 66, 68],
[77, 74, 86]], dtype=uint8)
Notice that all_rows
and all_cols
are 2D lists. Notice also that this is much more tedious and prone to error (multiplying columns by number of rows, rows by number of columns, which one to repeat element-wise, which one to repeat sublist-wise, etc.).
Another nice benefit of using np.ix_()
is that we can select non-square subarrays very easily without needing to worry about the headache behind the pure Python approach:
>>> x[np.ix_([1, 2], [0, 1, 3, 4])]
array([[72, 8, 81, 63],
[63, 51, 0, 68]], dtype=uint8)