Home > Enterprise >  Avoid three nested loops to search for all numbers not in a list
Avoid three nested loops to search for all numbers not in a list

Time:06-25

I want to find all the numbers that are not in a list and are equal to the following formula : x x_max * y x_max * y_max * z, where (x_max, y_max) are parameters and (x, y, z) can vary in a restricted set. I also want to know the values of (x, y, z) that I need to obtain each number.

I have succeeded to do this with the following code but it is currently extremely slow due to the three-nested loops. I am looking for a faster solution (maybe using NumPy arrays?).

What my code is actually doing with an extremely low speed (but works):

import numpy as np

x_max, y_max, z_max = 180, 90, 90

numbers = np.random.randint(x_max * y_max * z_max, size=1000)
results = np.array([[0, 0, 0, 0]]) # Initializes the array

for x in range(0, x_max):
    for y in range(0, y_max):
        for z in range(0, z_max):
            result = x   y * x_max   z * x_max * y_max
            if (result not in numbers):
                results = np.append(results, [[x, y, z, result]], axis=0)

# Initial line is useless
results = np.delete(results, (0), axis=0)

CodePudding user response:

Your nested loop and the calcuation:

for x in range(0, x_max):
    for y in range(0, y_max):
        for z in range(0, z_max):
            result = x   y * x_max   z * x_max * y_max

simply calculate all integers between 0 and x_max * y_max * z_max. And all the integers are unique as well: no integer is calculated twice.

That fact makes this a lot easier:

values = np.arange(x_max * y_max * z_max)
results = np.setdiff1d(values, numbers)

This will give you all the integers that have been calculated and are not in the numbers exclusion list.

Now you only miss the input x, y and z values. These, though, can be calculated from the actual results with some straightforward modulo arithmetic:

z = results // (x_max * y_max)
rem = results % (x_max * y_max)
y = rem // x_max
x = rem % x_max

Now you can stack it all nicely together:

results = np.array([x, y, z, results])

You can tweak your results array if needs be, e.g.:

results = results.T  # simple transpose
results = np.sort(results, axis=1)  # sort the inner list

I used the above to compare the output from this calculation with that of the triple-nested loop calculation. The results were indeed equal.

CodePudding user response:

Ok, so first of all, the main reason your code is slow is not the nested looping, it's because you're using np.append - this allocates an entirely new array with each append, and should almost never be used. You're much better off just using a python list, for which appending is an O(1) operation (internally, it only actually reallocates memory when the list grows past some multiple (I think something like 1 1/e times) its previous size).

The following should run somewhere on the order of 100x faster.

import numpy as np

x_max, y_max, z_max = 180, 90, 90

numbers = np.random.randint(x_max * y_max * z_max, size=1000)
# results = np.array([[0, 0, 0, 0]]) # Initializes the array <-- Unnecessary
results = []  # <-- Use a python list when you want to expand things

for x in range(0, x_max):
    print(f'x={x}')
    for y in range(0, y_max):
        for z in range(0, z_max):
            result = x   y * x_max   z * x_max * y_max
            if (result not in numbers):
                # results = np.append(results, [[x, y, z, result]], axis=0)  # <-- np.append is very slow O(N)
                results.append([x, y, z, result])  # <-- List.append is O(1)
results = np.array(results)

# Initial line is useless
# results = np.delete(results, (0), axis=0) <-- Unnecessary without the unnecessary initialization.

... but, we can still get faster using numpy vectorization.

# ... See above code for computation of "results"
print(f'Found result with python loop and list in {time.time() - t_start:.3f}s')

t_start = time.time()
# Get the x, y, z indices that you'd normally get from a nested loop.  They'll be in arrays of shape (x_max, y_max, z_max)
xs, ys, zs = np.meshgrid(np.arange(x_max), np.arange(y_max), np.arange(z_max), indexing='ij')
all_values = xs   ys * x_max   zs * x_max * y_max
valid_indices = ~np.isin(all_values, numbers)  # Get a shape (x_max, y_max, z_max) boolean mask
# Now use the mask to filter each array (yielding a flat (x_max*y_max*zmax) v[valid_indices] array)
# ... Then reshape it into a (x_max*y_max*zmax, 1) array
# ... So it can be stacked horizontally (h-stack) with the others along the second axis.
results_vectorized = np.hstack([v[valid_indices].reshape(-1, 1) for v in (xs, ys, zs, all_values)])
assert np.array_equal(results_vectorized, results)
print(f'Found result in {time.time() - t_start:.3f}s')

This is around 20x faster than the previous:

Found result with python loop and list in 3.630s
Found result in 0.154s

CodePudding user response:

Speeding things up with numpy is always the same process:

  1. Construct a big array containing all the information
  2. Do computations on the whole array at the same time

Here I use np.fromfunction to construct the big array from the formula you gave.

So here is my ~60x speedup solution:

import numpy as np
from functools import partial


def formula(x, y, z, x_max, y_max, z_max):
    return x   y * x_max   z * x_max * y_max


def my_solution(x_max, y_max, z_max, seen):
    seen = np.unique(seen)
    results = np.fromfunction(
        partial(formula, x_max=x_max, y_max=y_max, z_max=z_max),
        shape=(x_max, y_max, z_max),
    )
    mask_not_seen = ~np.isin(results, seen)
    results_not_seen = results[mask_not_seen]
    indices_not_seen = np.where(mask_not_seen)
    return np.stack([*indices_not_seen, results_not_seen], axis=-1)

Let's check that it outputs the same as your solution:

x_max, y_max, z_max = 18, 9, 9
seen = np.random.randint(x_max * y_max * z_max, size=100)

op_out = op_solution(x_max, y_max, z_max, seen)
my_out = my_solution(x_max, y_max, z_max, seen)

assert np.all(op_out == my_out)

and that it is indeed quicker (~60x):

...: %timeit op_solution(x_max, y_max, z_max, seen)
...: %timeit my_solution(x_max, y_max, z_max, seen)
9.74 ms ± 37.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
161 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Finally with the values you gave:

...: seen = np.random.randint(x_max * y_max * z_max, size=1000)
...: %timeit my_solution(x_max=180, y_max=90, z_max=90, seen=seen)
242 ms ± 3.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • Related