Problem
import numpy as np
I have an an array, without any prior information of its contents. For example:
ourarray = \
np.array([[0,1],
[2,3],
[4,5]])
I want to get the pairs of numbers which can be used for indexing ourarray
. Ie I want to get:
array([[0, 0, 1, 1, 2, 2],
[0, 1, 0, 1, 0, 1]])
(0,0
, 0,1
, 1,0
, etc., all the possible indices of ourarray
are in this array.)
Similar but different posts
how to find indices of a 2d numpy array occuring in another 2d array: here they search for one array within another one, not returning indices of the entire array.
Find indices of rows of numpy 2d array in another 2D array: they are dealing with two arrays to start with, the objective isn't to create a second array based on the first one containing its indices
Attempt 1 (Successful but inefficient)
I can get this array by:
np.array(np.where(np.ones(ourarray.shape)))
Which gives the desired result but it requires creting np.ones(ourarray.shape)
, which seems like not an efficient way of doing it.
Attempt 2 (Failed)
I also tried:
np.array(np.where(ourarray))
which does not work because there is no indices returned for the 0
entry of ourarray
.
Question
Attempt 1 works, but I am looking for a more efficient way. How can I do this more efficiently?
CodePudding user response:
You can use numpy.argwhere
then use .T
and get what you want.
try this:
>>> ourarray = np.array([[0,1],[2,3], [4,5]])
>>> np.argwhere(ourarray>=0).T
array([[0, 0, 1, 1, 2, 2],
[0, 1, 0, 1, 0, 1]])
If maybe any values exist in your array you can use this:
ourarray = np.array([[np.nan,1],[2,np.inf], [-4,-5]])
np.argwhere(np.ones(ourarray.shape)==1).T
# array([[0, 0, 1, 1, 2, 2],
# [0, 1, 0, 1, 0, 1]])
CodePudding user response:
How do you intend to use this index?
The tuple produced by nonzero
(where
) is designed for convenient indexing:
In [54]: idx = np.nonzero(np.ones_like(ourarray))
In [55]: idx
Out[55]: (array([0, 0, 1, 1, 2, 2]), array([0, 1, 0, 1, 0, 1]))
In [56]: ourarray[idx]
Out[56]: array([0, 1, 2, 3, 4, 5])
or equivalently using the 2 arrays explicitly:
In [57]: ourarray[idx[0], idx[1]]
Out[57]: array([0, 1, 2, 3, 4, 5])
Your np.array(idx)
can be used as in [57] but not as in [56]. The use of a tuple
in [56] is important.
If we apply transpose
to this we get an array.
In [58]: tidx = np.transpose(idx)
In [59]: tidx
Out[59]:
array([[0, 0],
[0, 1],
[1, 0],
[1, 1],
[2, 0],
[2, 1]])
to use that for indexing we have to iterate:
In [60]: [ourarray[i,j] for i,j in tidx]
Out[60]: [0, 1, 2, 3, 4, 5]
argwhere
as proposed in the other answer is just the transpose. Using outarray>=0
is really no different from the np.ones
expression. Both make an array that is True/1
for all elements.
In [61]: np.argwhere(np.ones_like(ourarray))
Out[61]:
array([[0, 0],
[0, 1],
[1, 0],
[1, 1],
[2, 0],
[2, 1]])
There are other ways of generating indices, np.indices
, np.meshgrid
, np.mgrid
, np.ndindex
, but they will require some sort of reshaping and/or transpose to get exactly what you want:
In [71]: np.indices(ourarray.shape)
Out[71]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[0, 1],
[0, 1],
[0, 1]]])
In [72]: np.indices(ourarray.shape).reshape(2,6)
Out[72]:
array([[0, 0, 1, 1, 2, 2],
[0, 1, 0, 1, 0, 1]])
timings
If ourarray>=0
works, it is faster than np.ones
:
In [79]: timeit np.ones_like(ourarray)
6.22 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [80]: timeit ourarray>=0
1.43 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
np.where/nonzero
adds a non-trivial time to that:
In [81]: timeit np.nonzero(ourarray>=0)
6.43 µs ± 8.15 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and a bit more time to convert the tuple to array:
In [82]: timeit np.array(np.nonzero(ourarray>=0))
10.4 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The transpose
round trip of argwhere
adds more time:
In [83]: timeit np.argwhere(ourarray>=0).T
16.9 µs ± 35.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
indices
is about the same as [82], though it may scale differently.
In [84]: timeit np.indices(ourarray.shape).reshape(2,-1)
10.9 µs ± 33.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)