Home > Software engineering >  Vectorization: Each row of the mask contains the column indices to mask for the corresponding row of
Vectorization: Each row of the mask contains the column indices to mask for the corresponding row of

Time:10-12

I have an array and a mask array. They have the same rows. Each row of the mask contains the indices to mask the array for the corresponding row. How to do the vectorization instead of using for loop?

Codes like this:

a = np.zeros((2, 4))
mask = np.array([[2, 3], [0, 1]])

# I'd like a vectorized way to do this (because the rows and cols are large):
a[0, mask[0]] = 1
a[1, mask[1]] = 1

This is what I want to obtain:

array([[0., 0., 1., 1.],
       [1., 1., 0., 0.]])

==================================

The question has been answered by @mozway, but the efficiency between the for-loop solution and vectorized one is questioned by @AhmedAEK. So I did the efficiency comparison:

N = 5000
M = 10000
a = np.zeros((N, M))

# choice without replacement
mask = np.random.rand(N, M).argpartition(3, axis=1)[:,:3]

def t1():
    for i in range(N):
        a[i, mask[i]] = 1
def t2():
    a[np.arange(a.shape[0])[:, None], mask] = 1

Then I use %timeit in Jupyter and got this screenshot:

enter image description here

CodePudding user response:

You can use:

a[[[0],[1]], mask] = 1

Or, programmatically generating the rows slicer:

a[np.arange(a.shape[0])[:,None], mask] = 1

output:

array([[0., 0., 1., 1.],
       [1., 1., 0., 0.]])
  • Related