I have an array, X, which I want to make monotonic. Specifically, I want to do
y = x.copy()
for i in range(len(x)):
y[i] = np.max(x[:i])
This is extremely slow for large arrays, but it feels like there should be a more efficient way of doing this. How can this operation be sped up?
CodePudding user response:
I think it will work faster to just keep track of the maximum rather than calculating it each time for each sub-array
y = x.copy()
_max = y[0]
for i in range(1, len(x)):
y[i] = _max
_max = max(x[i], _max)
CodePudding user response:
you can use list comprehension for it. but you need to start your loop from 1 not from 0. either you can use like that if you want loop from 0.
y=[np.max(x[:i 1]) for i in range(len(x))]
or like that
y=[np.max(x[:i]) for i in range(1,len(x) 1)]
CodePudding user response:
The OP implementation is very inefficient because it does not use the information acquired on the previous iteration, resulting in O(n²)
complexity.
def max_acc_OP(arr):
result = np.empty_like(arr)
for i in range(len(arr)):
result[i] = np.max(arr[:i 1])
return result
Note that I fixed the code by allowing to get the largest value among those up to position i
included. It is easy to write it so that values at position i
are excluded. This is true also for the code below.
Instead, if you keep track of the current maximum and use that to fill your output array you can easily get to O(n)
complexity:
def max_acc(arr):
result = np.empty_like(arr)
curr_max = arr[0]
for i, x in enumerate(arr):
if x > curr_max:
curr_max = x
result[i] = curr_max
return result
However, this is still relatively slow because of the explicit looping.
Luckily, one can either rewrite this in vectorized form combining np.fmax()
(or np.maximum()
-- depending on how you need NaNs to be handled) and np.ufunc.accumulate()
:
np.fmax.accumulate()
# or
np.maximum.accumulate()
or, accelerating the solution above with Numba:
max_acc_nb = nb.njit(max_acc)
Some timings on relatively large inputs are provided below:
n = 10000
arr = np.random.randint(0, n, n)
%timeit -n 4 -r 4 max_acc_OP(arr)
# 97.5 ms ± 14.2 ms per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 np.fmax.accumulate(arr)
# 112 µs ± 134 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 np.maximum.accumulate(arr)
# 88.4 µs ± 107 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 max_acc(arr)
# 2.32 ms ± 146 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 max_acc_nb(arr)
# 9.11 µs ± 3.01 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
indicating that max_acc()
is already much faster than max_acc_OP()
, but np.maximum.accumulate()
/ np.fmax.accumulate()
is even faster, and max_acc_nb()
comes out as the fastest. As always, it is important to take these kind of numbers with a grain of salt.