Home > Software engineering >  NumPy: Find first n columns according to mask
NumPy: Find first n columns according to mask

Time:12-09

Say I have an array arr in shape (m, n) and a boolean array mask in the same shape as arr. I would like to obtain the first N columns from arr that are True in mask as well.

An example:

arr = np.array([[1,2,3,4,5],
                [6,7,8,9,10],
                [11,12,13,14,15]])

mask = np.array([[False, True, True, True, True],
                [True, False, False, True, False],
                [True, True, False, False, False]]) 

N = 2

Given the above, I would like to write a (vectorized) function that outputs the following:

output = maskify_n_columns(arr, mask, N)
output = np.array(([2,3],[6,9],[11,12]))

I have written an iterative function; however, in the spirit of NumPy, I cannot accept an iterative solution with good conscience.

def maskify_n_columns_iteratively(arr, mask, N):
    

    valid_mask = (arr.sum(axis=1) >= N)
    
    arr= arr[valid_mask]
    l = arr.shape[0]
    
    output = np.zeros((l, N 1))
    
    for i in range(l):
        
        f = np.argwhere(cond[i]).reshape(-1)[:N]
        output[i,1:] = arr[i, f]
        
    return output

CodePudding user response:

You can use broadcasting, numpy.cumsum() and numpy.argmax().

def maskify_n_columns(arr, mask, N):
    m = (mask.cumsum(axis=1)[..., None] == np.arange(1,N 1)).argmax(axis=1)
    r = arr[np.arange(arr.shape[0])[:, None], m]
    return r

maskify_n_columns(arr, mask, 2)

Output:

[[ 2  3]
 [ 6  9]
 [11 12]]

CodePudding user response:

Here is one way to implement a function that can solve this problem using vectorized operations:

def maskify_n_columns(arr, mask, N):
    mask_as_none = np.where(mask, arr, None)
    return [x.tolist()[:2] for x in np.split(mask_as_none[mask_as_none!=None], mask_as_none.shape)]

maskify_n_columns(arr, mask , N)
# output [[2, 3], [6, 9], [11, 12]]
  • Related