I have two arrays:
vals
has shape (N,m) where N is ~1 million, and m is 3. The values are floats
I have another array indices
with shape (N,4)
. All values in indices
are row indices in vals
. (Additionally, unlike the example here, every row of indices
contains unique values.).
import numpy as np
from random import randrange
# set up the arrays for this test example (no need to improve this)
N = 9
vals = np.array(list(range(3*N))).reshape((N,3))
indices = np.array([randrange(N) for n in range(4*N)]).reshape((N,4))
I would like replace the following for loop when creating the array aug
# form an augmented matrix by indexing into vals using rows from indices
aug = np.stack([vals[indices[x]] for x in range(N)])
# compute a mean along axis=1 of aug
aug.mean(axis=1)
The broader context for the question is vals
contains numeric data for particles distributed in 3D. indices
is generated using a nearest neighbor search on the spatial positions of the particles (using scipy.spatial.cKDTree
) . I would like to average the numeric data over the nearest neighbors. As I have ~1 million particles, a for-loop is quite slow.
CodePudding user response:
You actually can replace the entire aug = ...
line with
aug = vals[indices]
That will produce the same result:
np.array_equal(
np.stack([vals[indices[x]] for x in range(N)]),
vals[indices]
)
# True