I want to find all the numbers that are not in a list and are equal to the following formula : x x_max * y x_max * y_max * z, where (x_max, y_max) are parameters and (x, y, z) can vary in a restricted set. I also want to know the values of (x, y, z) that I need to obtain each number.
I have succeeded to do this with the following code but it is currently extremely slow due to the three-nested loops. I am looking for a faster solution (maybe using NumPy arrays?).
What my code is actually doing with an extremely low speed (but works):
import numpy as np
x_max, y_max, z_max = 180, 90, 90
numbers = np.random.randint(x_max * y_max * z_max, size=1000)
results = np.array([[0, 0, 0, 0]]) # Initializes the array
for x in range(0, x_max):
for y in range(0, y_max):
for z in range(0, z_max):
result = x y * x_max z * x_max * y_max
if (result not in numbers):
results = np.append(results, [[x, y, z, result]], axis=0)
# Initial line is useless
results = np.delete(results, (0), axis=0)
CodePudding user response:
Your nested loop and the calcuation:
for x in range(0, x_max):
for y in range(0, y_max):
for z in range(0, z_max):
result = x y * x_max z * x_max * y_max
simply calculate all integers between 0
and x_max * y_max * z_max
. And all the integers are unique as well: no integer is calculated twice.
That fact makes this a lot easier:
values = np.arange(x_max * y_max * z_max)
results = np.setdiff1d(values, numbers)
This will give you all the integers that have been calculated and are not in the numbers
exclusion list.
Now you only miss the input x
, y
and z
values. These, though, can be calculated from the actual results
with some straightforward modulo arithmetic:
z = results // (x_max * y_max)
rem = results % (x_max * y_max)
y = rem // x_max
x = rem % x_max
Now you can stack it all nicely together:
results = np.array([x, y, z, results])
You can tweak your results array if needs be, e.g.:
results = results.T # simple transpose
results = np.sort(results, axis=1) # sort the inner list
I used the above to compare the output from this calculation with that of the triple-nested loop calculation. The results were indeed equal.
CodePudding user response:
Ok, so first of all, the main reason your code is slow is not the nested looping, it's because you're using np.append
- this allocates an entirely new array with each append, and should almost never be used. You're much better off just using a python list, for which appending is an O(1) operation (internally, it only actually reallocates memory when the list grows past some multiple (I think something like 1 1/e times) its previous size).
The following should run somewhere on the order of 100x faster.
import numpy as np
x_max, y_max, z_max = 180, 90, 90
numbers = np.random.randint(x_max * y_max * z_max, size=1000)
# results = np.array([[0, 0, 0, 0]]) # Initializes the array <-- Unnecessary
results = [] # <-- Use a python list when you want to expand things
for x in range(0, x_max):
print(f'x={x}')
for y in range(0, y_max):
for z in range(0, z_max):
result = x y * x_max z * x_max * y_max
if (result not in numbers):
# results = np.append(results, [[x, y, z, result]], axis=0) # <-- np.append is very slow O(N)
results.append([x, y, z, result]) # <-- List.append is O(1)
results = np.array(results)
# Initial line is useless
# results = np.delete(results, (0), axis=0) <-- Unnecessary without the unnecessary initialization.
... but, we can still get faster using numpy vectorization.
# ... See above code for computation of "results"
print(f'Found result with python loop and list in {time.time() - t_start:.3f}s')
t_start = time.time()
# Get the x, y, z indices that you'd normally get from a nested loop. They'll be in arrays of shape (x_max, y_max, z_max)
xs, ys, zs = np.meshgrid(np.arange(x_max), np.arange(y_max), np.arange(z_max), indexing='ij')
all_values = xs ys * x_max zs * x_max * y_max
valid_indices = ~np.isin(all_values, numbers) # Get a shape (x_max, y_max, z_max) boolean mask
# Now use the mask to filter each array (yielding a flat (x_max*y_max*zmax) v[valid_indices] array)
# ... Then reshape it into a (x_max*y_max*zmax, 1) array
# ... So it can be stacked horizontally (h-stack) with the others along the second axis.
results_vectorized = np.hstack([v[valid_indices].reshape(-1, 1) for v in (xs, ys, zs, all_values)])
assert np.array_equal(results_vectorized, results)
print(f'Found result in {time.time() - t_start:.3f}s')
This is around 20x faster than the previous:
Found result with python loop and list in 3.630s
Found result in 0.154s
CodePudding user response:
Speeding things up with numpy is always the same process:
- Construct a big array containing all the information
- Do computations on the whole array at the same time
Here I use np.fromfunction
to construct the big array from the formula you gave.
So here is my ~60x speedup solution:
import numpy as np
from functools import partial
def formula(x, y, z, x_max, y_max, z_max):
return x y * x_max z * x_max * y_max
def my_solution(x_max, y_max, z_max, seen):
seen = np.unique(seen)
results = np.fromfunction(
partial(formula, x_max=x_max, y_max=y_max, z_max=z_max),
shape=(x_max, y_max, z_max),
)
mask_not_seen = ~np.isin(results, seen)
results_not_seen = results[mask_not_seen]
indices_not_seen = np.where(mask_not_seen)
return np.stack([*indices_not_seen, results_not_seen], axis=-1)
Let's check that it outputs the same as your solution:
x_max, y_max, z_max = 18, 9, 9
seen = np.random.randint(x_max * y_max * z_max, size=100)
op_out = op_solution(x_max, y_max, z_max, seen)
my_out = my_solution(x_max, y_max, z_max, seen)
assert np.all(op_out == my_out)
and that it is indeed quicker (~60x):
...: %timeit op_solution(x_max, y_max, z_max, seen)
...: %timeit my_solution(x_max, y_max, z_max, seen)
9.74 ms ± 37.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
161 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Finally with the values you gave:
...: seen = np.random.randint(x_max * y_max * z_max, size=1000)
...: %timeit my_solution(x_max=180, y_max=90, z_max=90, seen=seen)
242 ms ± 3.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)