Given the following numpy
arrays:
import numpy as np
import time
v1 = np.linspace(20, 250, 100000000)
a = np.array([12.592,16.320])
m = np.array([3, 5])
How is it possible that list comprehension:
start = time.time()
v2 = np.max(
[10 ** _a * np.power(v1.astype(float), -_m) for _m, _a in zip(m, a)],
axis=0
)
end = time.time()
print(end - start) # prints 5.822041034698486
is more than twice as fast than numpy
's broadcasting
?
start = time.time()
v2 = np.max(
np.power(10, a) * np.power(v1.astype(float)[:, None], -m).reshape(
v1.shape[0],-1),
axis=1
)
end = time.time()
print(end - start) # prints 12.292157173156738
And what is the "fastest approach" of calculating v2
?
CodePudding user response:
Both are inefficient. Indeed, you can rewrite the operation much more efficiently using mathematical property of the natural logarithm and the exponential. Here is an equivalent (unoptimized) expression that can be further optimized:
v2 = np.max(
[np.exp(_a * np.log(10) - _m * np.log(v1.astype(float))) for _m, _a in zip(m, a)],
axis=0
)
Since np.log(v1.astype(float)))
is a constant, you can pre-compute it. Actually, v1
items are already of type float
(note that float
means np.float64
for Numpy and not the CPython float
object type). np.log(v1)
will do the job correctly (unless v1
is set to an array of np.float32
for example). Additionally, np.exp
can be computed only on the final result (ie. after computing np.max
) since a < b
is equivalent to e**a < e**b
. Finally, you can use in-place operations to avoid creating several expensive temporary arrays. This can be done using the out
parameter of many functions like np.subtract
or np.multiply
for example.
Here is the resulting code:
log_v1 = np.log(v1)
tmp = np.empty((len(a), v1.size), dtype=np.float64)
v2 = np.exp(np.max(
[np.subtract(_a * np.log(10), np.multiply(log_v1, _m, out=_tmp), out=_tmp) for _m, _a, _tmp in zip(m, a, tmp)],
axis=0,
out=tmp[0]
), out=tmp[0])
For even faster performance, you can simply use Numba so to avoid writing huge array in memory (which is slow). Numba can also use multiple threads to speed this up a lot. Here is an example:
import numba as nb
# Note that fastmath=True can cause issues with values likes NaN Inf.
# Please disable it if your input array contains such spacial values.
@nb.njit('float64[::1](float64[::1], float64[::1], float64[::1])', fastmath=True, parallel=True)
def compute(v1, m, a):
assert a.size > 0 and a.size == m.size
out = np.empty(v1.size, dtype=np.float64)
log10 = np.log(10)
for i in nb.prange(v1.size):
log_v1 = np.log(v1[i])
maxi = a[0] * log10 - m[0] * log_v1
for j in range(1, len(a)):
value = a[j] * log10 - m[j] * log_v1
if value > maxi:
maxi = value
out[i] = np.exp(maxi)
return out
v2 = compute(v1, m.astype(float), a)
Benchmark
Here are results on my 6-core machine:
Initial code with big arrays: 10.013 s (inefficient)
Initial code with lists: 8.147 s (inefficient)
Optimized Numpy code: 2.009 s (memory-bound)
Optimized Numba code: 0.300 s (compute-bound)
As you can see, Numba is much faster than the other methods: about 30 times faster.