Home > database >  How to get a 2D array containing indices of another 2D array
How to get a 2D array containing indices of another 2D array

Time:10-07

Problem

import numpy as np

I have an an array, without any prior information of its contents. For example:

ourarray = \
np.array([[0,1],
          [2,3],
          [4,5]])

I want to get the pairs of numbers which can be used for indexing ourarray. Ie I want to get:

array([[0, 0, 1, 1, 2, 2],
       [0, 1, 0, 1, 0, 1]])

(0,0, 0,1, 1,0, etc., all the possible indices of ourarray are in this array.)


Similar but different posts


Attempt 1 (Successful but inefficient)

I can get this array by:

np.array(np.where(np.ones(ourarray.shape)))

Which gives the desired result but it requires creting np.ones(ourarray.shape), which seems like not an efficient way of doing it.


Attempt 2 (Failed)

I also tried:

np.array(np.where(ourarray))

which does not work because there is no indices returned for the 0 entry of ourarray.


Question

Attempt 1 works, but I am looking for a more efficient way. How can I do this more efficiently?

CodePudding user response:

You can use numpy.argwhere then use .T and get what you want.

try this:

>>> ourarray = np.array([[0,1],[2,3], [4,5]])
>>> np.argwhere(ourarray>=0).T
array([[0, 0, 1, 1, 2, 2],
       [0, 1, 0, 1, 0, 1]])

If maybe any values exist in your array you can use this:

ourarray = np.array([[np.nan,1],[2,np.inf], [-4,-5]])
np.argwhere(np.ones(ourarray.shape)==1).T
# array([[0, 0, 1, 1, 2, 2],
#        [0, 1, 0, 1, 0, 1]])

CodePudding user response:

How do you intend to use this index?

The tuple produced by nonzero (where) is designed for convenient indexing:

In [54]: idx = np.nonzero(np.ones_like(ourarray))
In [55]: idx
Out[55]: (array([0, 0, 1, 1, 2, 2]), array([0, 1, 0, 1, 0, 1]))
In [56]: ourarray[idx]
Out[56]: array([0, 1, 2, 3, 4, 5])

or equivalently using the 2 arrays explicitly:

In [57]: ourarray[idx[0], idx[1]]
Out[57]: array([0, 1, 2, 3, 4, 5])

Your np.array(idx) can be used as in [57] but not as in [56]. The use of a tuple in [56] is important.

If we apply transpose to this we get an array.

In [58]: tidx = np.transpose(idx)
In [59]: tidx
Out[59]: 
array([[0, 0],
       [0, 1],
       [1, 0],
       [1, 1],
       [2, 0],
       [2, 1]])

to use that for indexing we have to iterate:

In [60]: [ourarray[i,j] for i,j in tidx]
Out[60]: [0, 1, 2, 3, 4, 5]

argwhere as proposed in the other answer is just the transpose. Using outarray>=0 is really no different from the np.ones expression. Both make an array that is True/1 for all elements.

In [61]: np.argwhere(np.ones_like(ourarray))
Out[61]: 
array([[0, 0],
       [0, 1],
       [1, 0],
       [1, 1],
       [2, 0],
       [2, 1]])

There are other ways of generating indices, np.indices, np.meshgrid , np.mgrid, np.ndindex, but they will require some sort of reshaping and/or transpose to get exactly what you want:

In [71]: np.indices(ourarray.shape)
Out[71]: 
array([[[0, 0],
        [1, 1],
        [2, 2]],

       [[0, 1],
        [0, 1],
        [0, 1]]])
In [72]: np.indices(ourarray.shape).reshape(2,6)
Out[72]: 
array([[0, 0, 1, 1, 2, 2],
       [0, 1, 0, 1, 0, 1]])

timings

If ourarray>=0 works, it is faster than np.ones:

In [79]: timeit np.ones_like(ourarray)
6.22 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [80]: timeit ourarray>=0
1.43 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

np.where/nonzero adds a non-trivial time to that:

In [81]: timeit np.nonzero(ourarray>=0)
6.43 µs ± 8.15 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

and a bit more time to convert the tuple to array:

In [82]: timeit np.array(np.nonzero(ourarray>=0))
10.4 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The transpose round trip of argwhere adds more time:

In [83]: timeit np.argwhere(ourarray>=0).T
16.9 µs ± 35.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

indices is about the same as [82], though it may scale differently.

In [84]: timeit np.indices(ourarray.shape).reshape(2,-1)
10.9 µs ± 33.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  • Related