Say I have an array arr
in shape (m, n)
and a boolean array mask
in the same shape as arr
. I would like to obtain the first N
columns from arr
that are True
in mask
as well.
An example:
arr = np.array([[1,2,3,4,5],
[6,7,8,9,10],
[11,12,13,14,15]])
mask = np.array([[False, True, True, True, True],
[True, False, False, True, False],
[True, True, False, False, False]])
N = 2
Given the above, I would like to write a (vectorized) function that outputs the following:
output = maskify_n_columns(arr, mask, N)
output = np.array(([2,3],[6,9],[11,12]))
I have written an iterative function; however, in the spirit of NumPy, I cannot accept an iterative solution with good conscience.
def maskify_n_columns_iteratively(arr, mask, N):
valid_mask = (arr.sum(axis=1) >= N)
arr= arr[valid_mask]
l = arr.shape[0]
output = np.zeros((l, N 1))
for i in range(l):
f = np.argwhere(cond[i]).reshape(-1)[:N]
output[i,1:] = arr[i, f]
return output
CodePudding user response:
You can use broadcasting, numpy.cumsum()
and numpy.argmax()
.
def maskify_n_columns(arr, mask, N):
m = (mask.cumsum(axis=1)[..., None] == np.arange(1,N 1)).argmax(axis=1)
r = arr[np.arange(arr.shape[0])[:, None], m]
return r
maskify_n_columns(arr, mask, 2)
Output:
[[ 2 3]
[ 6 9]
[11 12]]
CodePudding user response:
Here is one way to implement a function that can solve this problem using vectorized operations:
def maskify_n_columns(arr, mask, N):
mask_as_none = np.where(mask, arr, None)
return [x.tolist()[:2] for x in np.split(mask_as_none[mask_as_none!=None], mask_as_none.shape)]
maskify_n_columns(arr, mask , N)
# output [[2, 3], [6, 9], [11, 12]]