Home > Mobile >  Vectorised slicing of multiple rows in ndarray at arbitrary positions
Vectorised slicing of multiple rows in ndarray at arbitrary positions

Time:01-02

I often find myself holding an array not of indices, but of index bounds that effectively define multiple slices. A representative example is

import numpy as np

rand = np.random.default_rng(seed=0)
sample = rand.integers(low=0, high=10, size=(10, 10))
y, x = np.mgrid[:10, :10]
bad_starts = rand.integers(low=0, high=10, size=(10, 1))
print(bad_starts)

sample[
    (x >= bad_starts) & (y < 5)
] = -1

print(sample)
[[4]
 [7]
 [3]
 [2]
 [7]
 [8]
 [0]
 [0]
 [6]
 [3]]
[[ 8  6  5  2 -1 -1 -1 -1 -1 -1]
 [ 6  9  5  6  9  7  6 -1 -1 -1]
 [ 2  8  6 -1 -1 -1 -1 -1 -1 -1]
 [ 8  1 -1 -1 -1 -1 -1 -1 -1 -1]
 [ 4  0  0  1  0  6  5 -1 -1 -1]
 [ 7  3  4  9  8  9  3  6  9  6]
 [ 8  6  7  3  8  1  5  7  8  5]
 [ 3  3  4  4  7  8  0  9  5  3]
 [ 6  5  2  3  7  5  5  3  7  3]
 [ 3  8  2  2  7  6  0  0  3  8]]

Is there a simpler way to accomplish the same thing with slices alone, avoiding having to call mgrid and avoiding an entire boolean predicate matrix?

CodePudding user response:

With ogrid you get 'sparse' grid

In [488]: y,x
Out[488]: 
(array([[0],
        [1],
        [2],
        [3],
        [4],
        [5],
        [6],
        [7],
        [8],
        [9]]),
 array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]))

The mask is the same: (x >= bad_starts) & (y < 5)

A single value for each row can be fetched (or set) with:

In [491]: sample[np.arange(5)[:,None],bad_starts[:5]]
Out[491]: 
array([[-1],
       [-1],
       [-1],
       [-1],
       [-1]])

But there isn't a way of accessing all -1 with simple slicing. Each row has a different length slice:

In [492]: [sample[i,bad_starts[i,0]:] for i in range(5)]
Out[492]: 
[array([-1, -1, -1, -1, -1, -1]),
 array([-1, -1, -1]),
 array([-1, -1, -1, -1, -1, -1, -1]),
 array([-1, -1, -1, -1, -1, -1, -1, -1]),
 array([-1, -1, -1])]

There isn't a way to access all with one slice.

The equivalent 'advanced indexing' arrays are:

In [494]: np.nonzero((x >= bad_starts) & (y < 5))
Out[494]: 
(array([0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3,
        3, 3, 4, 4, 4]),
 array([4, 5, 6, 7, 8, 9, 7, 8, 9, 3, 4, 5, 6, 7, 8, 9, 2, 3, 4, 5, 6, 7,
        8, 9, 7, 8, 9]))
  • Related