Avoid three nested loops to search for all numbers not in a list-CodePudding

I want to find all the numbers that are not in a list and are equal to the following formula : x x_max * y x_max * y_max * z, where (x_max, y_max) are parameters and (x, y, z) can vary in a restricted set. I also want to know the values of (x, y, z) that I need to obtain each number.

I have succeeded to do this with the following code but it is currently extremely slow due to the three-nested loops. I am looking for a faster solution (maybe using NumPy arrays?).

What my code is actually doing with an extremely low speed (but works):

import numpy as np

x_max, y_max, z_max = 180, 90, 90

numbers = np.random.randint(x_max * y_max * z_max, size=1000)
results = np.array([[0, 0, 0, 0]]) # Initializes the array

for x in range(0, x_max):
    for y in range(0, y_max):
        for z in range(0, z_max):
            result = x   y * x_max   z * x_max * y_max
            if (result not in numbers):
                results = np.append(results, [[x, y, z, result]], axis=0)

# Initial line is useless
results = np.delete(results, (0), axis=0)

CodePudding user response：

Your nested loop and the calcuation:

for x in range(0, x_max):
    for y in range(0, y_max):
        for z in range(0, z_max):
            result = x   y * x_max   z * x_max * y_max

simply calculate all integers between 0 and x_max * y_max * z_max. And all the integers are unique as well: no integer is calculated twice.

That fact makes this a lot easier:

values = np.arange(x_max * y_max * z_max)
results = np.setdiff1d(values, numbers)

This will give you all the integers that have been calculated and are not in the numbers exclusion list.

Now you only miss the input x, y and z values. These, though, can be calculated from the actual results with some straightforward modulo arithmetic:

z = results // (x_max * y_max)
rem = results % (x_max * y_max)
y = rem // x_max
x = rem % x_max

Now you can stack it all nicely together:

results = np.array([x, y, z, results])

You can tweak your results array if needs be, e.g.:

results = results.T  # simple transpose
results = np.sort(results, axis=1)  # sort the inner list

I used the above to compare the output from this calculation with that of the triple-nested loop calculation. The results were indeed equal.

CodePudding user response：

Ok, so first of all, the main reason your code is slow is not the nested looping, it's because you're using np.append - this allocates an entirely new array with each append, and should almost never be used. You're much better off just using a python list, for which appending is an O(1) operation (internally, it only actually reallocates memory when the list grows past some multiple (I think something like 1 1/e times) its previous size).

The following should run somewhere on the order of 100x faster.

import numpy as np

x_max, y_max, z_max = 180, 90, 90

numbers = np.random.randint(x_max * y_max * z_max, size=1000)
# results = np.array([[0, 0, 0, 0]]) # Initializes the array <-- Unnecessary
results = []  # <-- Use a python list when you want to expand things

for x in range(0, x_max):
    print(f'x={x}')
    for y in range(0, y_max):
        for z in range(0, z_max):
            result = x   y * x_max   z * x_max * y_max
            if (result not in numbers):
                # results = np.append(results, [[x, y, z, result]], axis=0)  # <-- np.append is very slow O(N)
                results.append([x, y, z, result])  # <-- List.append is O(1)
results = np.array(results)

# Initial line is useless
# results = np.delete(results, (0), axis=0) <-- Unnecessary without the unnecessary initialization.

... but, we can still get faster using numpy vectorization.

# ... See above code for computation of "results"
print(f'Found result with python loop and list in {time.time() - t_start:.3f}s')

t_start = time.time()
# Get the x, y, z indices that you'd normally get from a nested loop.  They'll be in arrays of shape (x_max, y_max, z_max)
xs, ys, zs = np.meshgrid(np.arange(x_max), np.arange(y_max), np.arange(z_max), indexing='ij')
all_values = xs   ys * x_max   zs * x_max * y_max
valid_indices = ~np.isin(all_values, numbers)  # Get a shape (x_max, y_max, z_max) boolean mask
# Now use the mask to filter each array (yielding a flat (x_max*y_max*zmax) v[valid_indices] array)
# ... Then reshape it into a (x_max*y_max*zmax, 1) array
# ... So it can be stacked horizontally (h-stack) with the others along the second axis.
results_vectorized = np.hstack([v[valid_indices].reshape(-1, 1) for v in (xs, ys, zs, all_values)])
assert np.array_equal(results_vectorized, results)
print(f'Found result in {time.time() - t_start:.3f}s')

This is around 20x faster than the previous:

Found result with python loop and list in 3.630s
Found result in 0.154s

CodePudding user response：

Speeding things up with numpy is always the same process:

Construct a big array containing all the information
Do computations on the whole array at the same time

Here I use np.fromfunction to construct the big array from the formula you gave.

So here is my ~60x speedup solution:

import numpy as np
from functools import partial


def formula(x, y, z, x_max, y_max, z_max):
    return x   y * x_max   z * x_max * y_max


def my_solution(x_max, y_max, z_max, seen):
    seen = np.unique(seen)
    results = np.fromfunction(
        partial(formula, x_max=x_max, y_max=y_max, z_max=z_max),
        shape=(x_max, y_max, z_max),
    )
    mask_not_seen = ~np.isin(results, seen)
    results_not_seen = results[mask_not_seen]
    indices_not_seen = np.where(mask_not_seen)
    return np.stack([*indices_not_seen, results_not_seen], axis=-1)

Let's check that it outputs the same as your solution:

x_max, y_max, z_max = 18, 9, 9
seen = np.random.randint(x_max * y_max * z_max, size=100)

op_out = op_solution(x_max, y_max, z_max, seen)
my_out = my_solution(x_max, y_max, z_max, seen)

assert np.all(op_out == my_out)

and that it is indeed quicker (~60x):

...: %timeit op_solution(x_max, y_max, z_max, seen)
...: %timeit my_solution(x_max, y_max, z_max, seen)
9.74 ms ± 37.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
161 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Finally with the values you gave:

...: seen = np.random.randint(x_max * y_max * z_max, size=1000)
...: %timeit my_solution(x_max=180, y_max=90, z_max=90, seen=seen)
242 ms ± 3.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)