Home > database >  How to improve partial 2D array filling/filtering in numpy
How to improve partial 2D array filling/filtering in numpy

Time:11-29

I'm currently reading reviewing code. I have a double loop to filter a 2D array according to a 1D array. Here is the code:

import numpy as np

size_a = 500
size_b = 2000
a = np.random.rand(size_a)            
c = np.random.rand(size_a*size_b).reshape((size_a, size_b))
d = np.random.rand(size_b)
                                   
# double loop for filtering c according to d                           
for i in range(size_b):
    for j in range(size_a):
        if a[j] <= d[i]:
            c[j,i] = 0

# first improvement
for i in range(size_b):
    c[a <= d[i], i] = 0

Can we do better ? In terms of speed of execution.

CodePudding user response:

Let's describe what you want to do in words:

  • You have a matrix c of shape (size_a, size_b).
  • You have a vector a with one element per row of c
  • You have a vector d with one element per column of c

In those locations where a[i] <= d[j], you want to set c to zero.

Let's say we have:

size_a = 3
size_b = 5

a = np.array([23, 55, 37])
c = np.array([[56, 37, 50, 49, 57],
       [81, 50, 98, 11,  9],
       [52, 47, 23, 64, 20]])

d = np.array([27, 16, 74, 95,  8])

I obtained these using np.random.randint(0, 100, shape) for each of the arrays, but I've hardcoded them here for ease of reproduction and understanding

You can compare the vectors a and d at every combination of i and j two ways:

  1. Cast them into matrices of the same size:
a_cast = np.tile(a, (size_b, 1)).T
# array([[23, 23, 23, 23, 23],
#        [55, 55, 55, 55, 55],
#        [37, 37, 37, 37, 37]])

d_cast = np.tile(d, (size_a, 1))
# array([[27, 16, 74, 95,  8],
#        [27, 16, 74, 95,  8],
#        [27, 16, 74, 95,  8]])

mask = a_cast <= d_cast
# array([[ True, False,  True,  True, False],
#        [False, False,  True,  True, False],
#        [False, False,  True,  True, False]])

c[mask] = 0
  1. Have numpy do the broadcasting for you, by making a a (size_a, 1) matrix and d a (1, size_b) matrix. This is done by taking a slice of each vector, with an extra None axis, which creates an extra axis of length 1 (see In numpy, what does selection by [:,None] do?):
mask = a[:, None] <= d[None, :]

# a[:, None] gives:
# array([[23],
#        [55],
#        [37]])

# d[None, :] gives:
# array([[27, 16, 74, 95,  8]])


c[mask] = 0

Both these methods result in the same c:

array([[ 0, 37,  0,  0, 57],
       [81, 50,  0,  0,  9],
       [52, 47,  0,  0, 20]])
  • Related