Python: create 3D array using values of another 3D array that meet a condition-CodePudding

I'm basically trying to take the weighted mean of a 3D dataset, but only on a filtered subset of the data, where the filter is based off of another (2D) array. The shape of the 2D data matches the first 2 dimensions of the 3D data, and is thus repeated for each slice in the 3rd dimension.

Something like:

import numpy as np

myarr = np.array([[[4,6,8],[9,3,2]],[[2,7,4],[3,8,6]],[[1,6,7],[7,8,3]]])
myarr2 = np.array([[7,3],[6,7],[2,6]])
weights = np.random.rand(3,2,3)

filtered = []
for k in range(len(myarr[0,0,:])):
    temp1 = myarr[:,:,k]
    temp2 = weights[:,:,k]
    filtered.append(temp1[np.where(myarr2 > 5)]*temp2[np.where(myarr2 > 5)])

average = np.array(np.sum(filtered,1)/len(filtered[0]))

I am concerned about efficiency here. Is it possible to vectorize this so I don't need the loop, or are there other suggestions to make this more efficient?

CodePudding user response：

The most glaring efficiency issue, even the loop aside, is that np.where(...) is being called multiple times inside the loop, on the same condition! You can just do this a single time beforehand. Moreover, there is no need for a loop. Your operation basically equates to:

mask = myarr2 > 5
average = (myarr[mask] * weights[mask]).mean(axis=0)

There is no need for an np.where either.

myarr2 is an array of shape (i, j) with same first two dims as myarr and weight, which have some shape (i, j, k).

So if there are n True elements in the boolean mask myarr2 > 5, you can apply it on your other arrays to obtain (n, k) elements (taking all elements along third axis, when there is a True at a certain [i, j] position).