Indexing a 2d array using a list of arrays-CodePudding

I want to index a 2d array based on a list of arrays.

a = np.array([
    [1,2,3],
    [4,5,6]])

idx = [np.array([0,1], np.array([0,2])]

What I want then want is that the first element in the idx should give a[0,1] and the second a[0,2] and so on such that:

a[fixed_idx] = array([2,3])

CodePudding user response：

IIUC, you could do:

a[tuple(zip(*idx))]

output: array([2, 3])

CodePudding user response：

Suppose you have more indices, like:

dummy_idx = [n for n in np.random.randint(100, size=(1000000, 2))]

Then you need to get advanced indices x and y such that a[x, y] gives what you expect.

There are two easy ways to do that:

x, y = zip(*dummy_idx)
x, y = np.transpose(dummy_idx)

First method quite a slow because numpy arrays are not designed for fast iteration and hence it takes quite a long time to access their items in comparison with numpy vectorised actions. On the other hand, np.transpose collects multiple arrays into a new one which is even worse because each step requires to save them in some place of this new array which is even more expensive.

This is a red flag that you're trying to work with data structures numpy is not designed for. Actually, it is slow if you're working with a plenty of small arrays.

However, there are two methods np.ndarray.tolist and np.ndarray.tobytes that are optimized a little bit better for repeated usage. So you could use this advantage and try to mimic behaviour of np.transpose(dummy_idx) in a 30% faster way:

ls = []
for n in dummy_idx: 
    ls.extend(n.tolist())
x, y = np.fromiter(ls, dtype=dummy_idx[0].dtype).reshape(-1, 2).T

and

b = bytearray()
for n in dummy_idx: 
    b.extend(n.tobytes())
x, y = np.frombuffer(b, dtype=dummy_idx[0].dtype).reshape(-1, 2).T

Results

zip - 161 ms
np.transpose - 205 ms
np.fromiter - 117 ms
np.frombuffer - 117 ms
single looping dummy_idx (in comparison) - 16 ms