I'm currently reading reviewing code. I have a double loop to filter a 2D array according to a 1D array. Here is the code:
import numpy as np
size_a = 500
size_b = 2000
a = np.random.rand(size_a)
c = np.random.rand(size_a*size_b).reshape((size_a, size_b))
d = np.random.rand(size_b)
# double loop for filtering c according to d
for i in range(size_b):
for j in range(size_a):
if a[j] <= d[i]:
c[j,i] = 0
# first improvement
for i in range(size_b):
c[a <= d[i], i] = 0
Can we do better ? In terms of speed of execution.
CodePudding user response:
Let's describe what you want to do in words:
- You have a matrix
c
of shape(size_a, size_b)
. - You have a vector
a
with one element per row ofc
- You have a vector
d
with one element per column ofc
In those locations where a[i] <= d[j]
, you want to set c
to zero.
Let's say we have:
size_a = 3
size_b = 5
a = np.array([23, 55, 37])
c = np.array([[56, 37, 50, 49, 57],
[81, 50, 98, 11, 9],
[52, 47, 23, 64, 20]])
d = np.array([27, 16, 74, 95, 8])
I obtained these using np.random.randint(0, 100, shape)
for each of the arrays, but I've hardcoded them here for ease of reproduction and understanding
You can compare the vectors a
and d
at every combination of i
and j
two ways:
- Cast them into matrices of the same size:
a_cast = np.tile(a, (size_b, 1)).T
# array([[23, 23, 23, 23, 23],
# [55, 55, 55, 55, 55],
# [37, 37, 37, 37, 37]])
d_cast = np.tile(d, (size_a, 1))
# array([[27, 16, 74, 95, 8],
# [27, 16, 74, 95, 8],
# [27, 16, 74, 95, 8]])
mask = a_cast <= d_cast
# array([[ True, False, True, True, False],
# [False, False, True, True, False],
# [False, False, True, True, False]])
c[mask] = 0
- Have numpy do the broadcasting for you, by making
a
a(size_a, 1)
matrix andd
a(1, size_b)
matrix. This is done by taking a slice of each vector, with an extraNone
axis, which creates an extra axis of length 1 (see In numpy, what does selection by [:,None] do?):
mask = a[:, None] <= d[None, :]
# a[:, None] gives:
# array([[23],
# [55],
# [37]])
# d[None, :] gives:
# array([[27, 16, 74, 95, 8]])
c[mask] = 0
Both these methods result in the same c
:
array([[ 0, 37, 0, 0, 57],
[81, 50, 0, 0, 9],
[52, 47, 0, 0, 20]])