Home > Software engineering >  Numpy array slicing to return sliced array and corresponding array indices
Numpy array slicing to return sliced array and corresponding array indices

Time:06-17

I'm trying to generate two numpy arrays from one. One which is a slice slice of an original array, and another which represents the indexes which can be used to look up the values produced. The best way I can explain this is by example:

import numpy as np

original = np.array([
    [5, 3, 7, 3, 2],
    [8, 4, 22, 6, 4],
])

sliced_array = original[:,::3]
indices_of_slice = None # here, what command to use?

for val, idx in zip(np.nditer(sliced_array), np.nditer(indices_of_slice)):
    # I want indices_of_slice to behave the following way:
    assert val == original[idx], "Error. This implementation is not correct. "

Ultimately what I'm aiming for is an array which I can iterate through with nditer, and a corresponding array indices_of_slices, which returns the original lookup indices (i,j,...). Then, the value of the sliced array should be equal to value of the original array in index (i,j,...).

Main question is: Can I return both the new sliced array as well as the indices of the values when slicing a numpy array? Hopefully all is clear!

Edit: Here are the expected printouts of the two arrays:

# print(sliced_array)
# >>> [[5 3]
# >>>  [8 6]]

# expected result of 
# print(indices_of_slice)
# >>> [[(0 0) (0 3)]
# >>>  [(1 0) (1 3)]]

CodePudding user response:

You can use numpy's slice np.s_[] with a tiny bit of gymnastics to get the indices you are looking for:

slc = np.s_[:, ::3]

shape = original.shape
ix = np.unravel_index(np.arange(np.prod(shape)).reshape(shape)[slc], shape)

>>> ix
(array([[0, 0],
        [1, 1]]),
 array([[0, 3],
        [0, 3]]))

>>> original[ix]
array([[5, 3],
       [8, 6]])

>>> original[slc]
array([[5, 3],
       [8, 6]])

Note that this works with slices that have some reverse direction:

slc = np.s_[:, ::-2]

# ... (as above)

>>> ix
(array([[0, 0, 0],
        [1, 1, 1]]),
 array([[4, 2, 0],
        [4, 2, 0]]))

>>> np.array_equal(original[ix], original[slc])
True

CodePudding user response:

In [22]: original = np.array([
    ...:     [5, 3, 7, 3, 2],
    ...:     [8, 4, 22, 6, 4],
    ...: ])    
In [23]: sliced_array = original[:,::3]

Make a boolean array with the same slicing:

In [24]: mask = np.zeros(original.shape, dtype=bool)    
In [25]: mask[:,::3] = True    
In [26]: mask
Out[26]: 
array([[ True, False, False,  True, False],
       [ True, False, False,  True, False]])

mask selects the same values - but in raveled form:

In [27]: sliced_array
Out[27]: 
array([[5, 3],
       [8, 6]])

In [28]: original[mask]
Out[28]: array([5, 3, 8, 6])

We can get indices from the mask:

In [30]: idx = np.argwhere(mask)

In [31]: idx
Out[31]: 
array([[0, 0],
       [0, 3],
       [1, 0],
       [1, 3]], dtype=int64)

And iteratively apply them:

In [32]: for ij,v in zip(idx, sliced_array.ravel()):
    ...:     print(original[tuple(ij)], v)
    ...:     
5 5
3 3
8 8
6 6

Testing this with advanced indexing:

In [49]: aslc = ([[0],[1]], [0,2,4])

In [50]: sliced_array = original[aslc]; sliced_array
Out[50]: 
array([[ 5,  7,  2],
       [ 8, 22,  4]])

In [51]: mask = np.zeros(original.shape, dtype=bool); mask[aslc] = True; mask
Out[51]: 
array([[ True, False,  True, False,  True],
       [ True, False,  True, False,  True]])

In [52]: idx = np.argwhere(mask); idx
Out[52]: 
array([[0, 0],
       [0, 2],
       [0, 4],
       [1, 0],
       [1, 2],
       [1, 4]], dtype=int64)

In [54]: original[mask]
Out[54]: array([ 5,  7,  2,  8, 22,  4])

In [55]: for ij,v in zip(idx, sliced_array.ravel()):
    ...:     print(original[tuple(ij)], v)
    ...:     
5 5
7 7
2 2
8 8
22 22
4 4

This doesn't work with all variations of advanced indexing; for example, indices the reverse the order or duplicate rows won't match.

In [66]: aslc = (slice(None,None,-1),slice(3,None,-3))

In [67]: sliced_array = original[aslc]; sliced_array
Out[67]: 
array([[6, 8],
       [3, 5]])

In [68]: mask = np.zeros(original.shape, dtype=bool); mask[aslc] = True; mask
Out[68]: 
array([[ True, False, False,  True, False],
       [ True, False, False,  True, False]])

In [69]: original[mask]
Out[69]: array([5, 3, 8, 6])

mask selects the same values, but in a different order.

The slice is a view of original. That is it uses the same data. But it starts at a different point, and uses different strides.

In [70]: original.__array_interface__
Out[70]: 
{'data': (2697031319568, False),
 'strides': None,
 'descr': [('', '<i4')],
 'typestr': '<i4',
 'shape': (2, 5),
 'version': 3}

In [71]: sliced_array.__array_interface__
Out[71]: 
{'data': (2697031319600, False),
 'strides': (-20, -12),
 'descr': [('', '<i4')],
 'typestr': '<i4',
 'shape': (2, 2),
 'version': 3}

In general, numpy indexing is a one-way street. It creates a new array, whether view or copy, that has the desired values, but it does not create, or return, a mapping, or a reverse mapping. Except for some special cases, we can't identify where in original the sliced_array values are found.

edit

The other answer suggests starting with np.s_:

In [85]: np.s_[:,::3]
Out[85]: (slice(None, None, None), slice(None, None, 3))

That produces a tuple of slice objects. He still has to use arange to generate the indices, since the slices themselves don't have original.shape information.

ogrid can be used to create advanced indexing arrays:

In [86]: idx = np.ogrid[:2,:5:3]; idx
Out[86]: 
[array([[0],
        [1]]),
 array([[0, 3]])]
In [88]: original[tuple(idx)]
Out[88]: 
array([[5, 3],
       [8, 6]])

meshgrid with sparse=True gives something similar. Or with fully populated arrays:

In [89]: idx = np.mgrid[:2,:5:3]; idx
Out[89]: 
array([[[0, 0],
        [1, 1]],

       [[0, 3],
        [0, 3]]])

In [90]: original[tuple(idx)]
Out[90]: 
array([[5, 3],
       [8, 6]])

There are various ways of transforming that array into a (n,2) set of indices that could be used individually (like argwhere in my previous code):

In [92]: np.transpose(idx,(1,2,0)).reshape(-1,2)
Out[92]: 
array([[0, 0],
       [0, 3],
       [1, 0],
       [1, 3]])

In [93]: for ij in _: print(original[tuple(ij)])
5
3
8
6

ix_ is another handy tool for creating advanced indexing arrays, the equivalent of slices:

In [94]: np.ix_(np.arange(2), np.arange(0,5,3))
Out[94]: 
(array([[0],
        [1]]),
 array([[0, 3]]))

In [95]: original[_]
Out[95]: 
array([[5, 3],
       [8, 6]])

Keep in mind that indexing with slices, as in original[:,::3] produces a view. Indexing with array is slower, since it makes a copy. And iterative indexing is even slower.

In [96]: mask
Out[96]: 
array([[ True, False, False,  True, False],
       [ True, False, False,  True, False]])

In [97]: np.nonzero(mask)      # aka np.where(mask)
Out[97]: (array([0, 0, 1, 1], dtype=int64), array([0, 3, 0, 3], dtype=int64))

In [98]: np.argwhere(mask)
Out[98]: 
array([[0, 0],
       [0, 3],
       [1, 0],
       [1, 3]], dtype=int64)

nonzero produces a tuple of arrays that can be used directly to index the array:

In [99]: original[Out[97]]
Out[99]: array([5, 3, 8, 6])

argwhere gives the same values, but in a (n,2) form that has to be used iteratively, or in a somewhat awkward:

In [100]: original[Out[98][:,0], Out[98][:,1]]
Out[100]: array([5, 3, 8, 6])
  • Related