I'm basically trying to take the weighted mean of a 3D dataset, but only on a filtered subset of the data, where the filter is based off of another (2D) array. The shape of the 2D data matches the first 2 dimensions of the 3D data, and is thus repeated for each slice in the 3rd dimension.
Something like:
import numpy as np
myarr = np.array([[[4,6,8],[9,3,2]],[[2,7,4],[3,8,6]],[[1,6,7],[7,8,3]]])
myarr2 = np.array([[7,3],[6,7],[2,6]])
weights = np.random.rand(3,2,3)
filtered = []
for k in range(len(myarr[0,0,:])):
temp1 = myarr[:,:,k]
temp2 = weights[:,:,k]
filtered.append(temp1[np.where(myarr2 > 5)]*temp2[np.where(myarr2 > 5)])
average = np.array(np.sum(filtered,1)/len(filtered[0]))
I am concerned about efficiency here. Is it possible to vectorize this so I don't need the loop, or are there other suggestions to make this more efficient?
CodePudding user response:
The most glaring efficiency issue, even the loop aside, is that np.where(...)
is being called multiple times inside the loop, on the same condition! You can just do this a single time beforehand. Moreover, there is no need for a loop. Your operation basically equates to:
mask = myarr2 > 5
average = (myarr[mask] * weights[mask]).mean(axis=0)
There is no need for an np.where
either.
myarr2
is an array of shape (i, j)
with same first two dims as myarr
and weight
, which have some shape (i, j, k)
.
So if there are n
True elements in the boolean mask myarr2 > 5
, you can apply it on your other arrays to obtain (n, k)
elements (taking all elements along third axis, when there is a True at a certain [i, j] position).