Home > Enterprise >  Preventing Numpy Slice/Index from changing the order of the output automatically
Preventing Numpy Slice/Index from changing the order of the output automatically

Time:04-28

Hi basically it's easiest to show with the example below

testArray = np.array([[False, False, True, False], [False, False, True, False], [False, False, True, False], [False, True, False, False]]).T
wordArray = np.repeat(['A', 'B', 'C', 'D'], 4).reshape(4,4)

testArray
>array([[False, False, False, False],
   [False, False, False,  True],
   [ True,  True,  True, False],
   [False, False, False, False]])

wordArray[testArray]
> array(['B', 'C', 'C', 'C'], dtype='<U1')

What I'm expecting is the order of my "indexing" to be preserved, so I want array(['C', 'C', 'C', 'B'], dtype='<U1') as the testArray appears.

Thanks

-EDIT-

Upon further testing (modifying the word array) it looks as if it just reverses the output instead of sorting it. Does anyone know where to read more about this? Is this 100% to happen?

I'm running numpy 1.21.5 and python 3.10

CodePudding user response:

The index was not changed and remained preserved.

import numpy as np

testArray = np.array([[False, False, True, False], [False, False, True, False], [False, False, True, False], [False, True, False, False]]).T
wordArray = np.repeat(['A', 'B', 'C', 'D'], 4).reshape(4,4)

wordArray
array([['A', 'A', 'A', 'A'],
       ['B', 'B', 'B', 'B'],
       ['C', 'C', 'C', 'C'],
       ['D', 'D', 'D', 'D']], dtype='<U1')
testArray
array([[False, False, False, False],
       [False, False, False,  True],
       [ True,  True,  True, False],
       [False, False, False, False]])
wordArray[testArray]
array(['B', 'C', 'C', 'C'], dtype='<U1')

Explanation:

There is no True in testArray[0] so no A. In testArray[1] there's True at index 3 which is why there's B. In testArray[2] there are three True so C, C, C. And no True in the last column that's why no value of D from wordArray[3].

If you wish to get that output(mentioned in question) then you need to transpose wordArray and not testArray:

testArray = np.array([[False, False, True, False], [False, False, True, False], [False, False, True, False], [False, True, False, False]])
wordArray = np.repeat(['A', 'B', 'C', 'D'], 4).reshape(4,4).T
wordArray[testArray]
array(['C', 'C', 'C', 'B'], dtype='<U1')

Taking an example of two 6x4 arrays:

testArray = np.array([[False, False, True, False], [False, False, True, False], [False, False, True, False], [False, True, False, False], [False, True, True, False], [True, False, False, True]]).T
testArray
array([[False, False, False, False, False,  True],
       [False, False, False,  True,  True, False],
       [ True,  True,  True, False,  True, False],
       [False, False, False, False, False,  True]])
wordArray = np.repeat(['A', 'B', 'C', 'D'], 6).reshape(4,6)
wordArray
array([['A', 'A', 'A', 'A', 'A', 'A'],
       ['B', 'B', 'B', 'B', 'B', 'B'],
       ['C', 'C', 'C', 'C', 'C', 'C'],
       ['D', 'D', 'D', 'D', 'D', 'D']], dtype='<U1')
wordArray[testArray]
array(['A', 'B', 'B', 'C', 'C', 'C', 'C', 'D'], dtype='<U1')

Explanation:

In testArray[0], True is at index 5. So, A from wordArray[0] is the output array's first element. In testArray[1], True is at index 3 and 4. So, B from wordArray[1] is the output array's second and third elements. In testArray[2], True is at index 0,1,2 and 4. So, C from wordArray[2] is the output array's fourth,fifth,sixth and seventh elements. In testArray[3], True is at index 5. So, D from wordArray[3] is the output array's last element.


Conclusion:

Just print the arrays, go element-wise and check for True values. The value would be printed only if there's a True value on that index. So, the index is not shuffled/changed and remains preserved.

CodePudding user response:

According to the numpy indexing docs, boolean array section:

https://numpy.org/doc/stable/user/basics.indexing.html#boolean-array-indexing

If obj.ndim == x.ndim, x[obj] returns a 1-dimensional array filled with 
the elements of x corresponding to the True values of obj. The search 
order will be row-major, C-style.

That's the same as searching on the ravel arrays, which is also row-major, C-style (default that is)

In [92]: testArray.ravel()
Out[92]: 
array([False, False, False, False, False, False, False,  True,  True,
        True,  True, False, False, False, False, False])
In [94]: wordArray.ravel()
Out[94]: 
array(['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'D',
       'D', 'D', 'D'], dtype='<U1')

So the indexing is:

In [96]: wordArray.ravel()[testArray.ravel()]
Out[96]: array(['B', 'C', 'C', 'C'], dtype='<U1')

You could ravel in column-order:

In [97]: wordArray.ravel(order='F')[testArray.ravel(order='F')]
Out[97]: array(['C', 'C', 'C', 'B'], dtype='<U1')

That's equivalent to working with the transposes.

In [98]: wordArray.T[testArray.T]
Out[98]: array(['C', 'C', 'C', 'B'], dtype='<U1')

It may be easier to visualize this indexing by looking at the indices of the True values in testArray:

As a tuple of arrays, or as 2d array:

In [99]: np.nonzero(testArray)
Out[99]: (array([1, 2, 2, 2]), array([3, 0, 1, 2]))
In [100]: np.argwhere(testArray)
Out[100]: 
array([[1, 3],    # row 1, column 3, B
       [2, 0],    # row 2 all C
       [2, 1],
       [2, 2]])
  • Related