I want to index a 2d array based on a list of arrays.
a = np.array([
[1,2,3],
[4,5,6]])
idx = [np.array([0,1], np.array([0,2])]
What I want then want is that the first element in the idx should give a[0,1]
and the second a[0,2]
and so on such that:
a[fixed_idx] = array([2,3])
CodePudding user response:
IIUC, you could do:
a[tuple(zip(*idx))]
output: array([2, 3])
CodePudding user response:
Suppose you have more indices, like:
dummy_idx = [n for n in np.random.randint(100, size=(1000000, 2))]
Then you need to get advanced indices x
and y
such that a[x, y]
gives what you expect.
There are two easy ways to do that:
x, y = zip(*dummy_idx)
x, y = np.transpose(dummy_idx)
First method quite a slow because numpy
arrays are not designed for fast iteration and hence it takes quite a long time to access their items in comparison with numpy
vectorised actions. On the other hand, np.transpose
collects multiple arrays into a new one which is even worse because each step requires to save them in some place of this new array which is even more expensive.
This is a red flag that you're trying to work with data structures numpy is not designed for. Actually, it is slow if you're working with a plenty of small arrays.
However, there are two methods np.ndarray.tolist
and np.ndarray.tobytes
that are optimized a little bit better for repeated usage. So you could use this advantage and try to mimic behaviour of np.transpose(dummy_idx)
in a 30% faster way:
ls = []
for n in dummy_idx:
ls.extend(n.tolist())
x, y = np.fromiter(ls, dtype=dummy_idx[0].dtype).reshape(-1, 2).T
and
b = bytearray()
for n in dummy_idx:
b.extend(n.tobytes())
x, y = np.frombuffer(b, dtype=dummy_idx[0].dtype).reshape(-1, 2).T
Results
zip
- 161 msnp.transpose
- 205 msnp.fromiter
- 117 msnp.frombuffer
- 117 ms- single looping
dummy_idx
(in comparison) - 16 ms