Home > OS >  Is NumPy any faster than default python when iterating over a list?
Is NumPy any faster than default python when iterating over a list?

Time:07-21

I have a project where data is stored in a 2-dimensional array and each needs to be analyzed. I need to iterate through it. I just learned NumPy and was considering using it for this project, but after running some code I found that it ended up slower than just using a for loop.

import numpy
import time

list_ = [[num   1 for num in range(5)] for lst in range(1000000)]
array = numpy.array([[num   1 for num in range(5)] for lst in range(1000000)])

start = time.perf_counter()
count = 0
for lst in list_:
    for num in lst:
        count  = 1
elapsed = time.perf_counter() - start
print(f"A default python list took {elapsed:0.5f} seconds to get through {count} elements.")

start = time.perf_counter()
count = 0
for num in numpy.nditer(array):
    count  = 1
elapsed = time.perf_counter() - start
print(f"A numpy array took {elapsed:0.5f} seconds to get through {count} elements.")

The for loop takes on average ~0.5 seconds, while NumPy takes ~0.67 seconds. My run-through of NumPy syntax was somewhat surface level, so is there something I'm missing here that could run faster than the conventional for loop?

CodePudding user response:

Yes Numpy arrays are faster than python lists, Numpy splits a task into multiple parts, handling them all in parallel.It also integrates Numpy c and c code. These programming languages have shorter execution times.

CodePudding user response:

import numpy as np

list_2D = [[num for num in range(1, 6)] for lst in range(100000)]
arr_2D = np.array(list_2D)
%timeit -n 10 -r 7 sum([sum(list_1D) for list_1D in list_2D])
> 11.5 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 -r 7 np.sum(arr_2D)
> 150 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

NumPy array is 77% faster for this example to calculate the sum of all elements.

CodePudding user response:

Native python is slow (~100x slower than C). Numpy is fast because it is written in C and converts its internal data to ctypes. There is an overhead every time you need to pass data between numpy and python.

To use numpy effectively, you need to read the docs and find the approprate numpy function to have numpy run vectorized loops internally without resorting to a native python loop.

Here is your code rewritten using compiled numpy functions, and its an order (or two) of magnitude faster

import numpy as np
import time

start = time.perf_counter()
list_ = [[num   1 for num in range(5)] for lst in range(1000000)]
elapsed = time.perf_counter() - start
print(f"{elapsed:0.3f}s = native python list initialization")

start = time.perf_counter()
array = np.full((1000000,5), range(1,5 1))
elapsed = time.perf_counter() - start
print(f"{elapsed:0.3f}s = numpy list initialization for shape {array.shape}")

start = time.perf_counter()
count = 0
sum   = 0
for lst in list_:
    for num in lst:
        count  = 1
        sum    = num
elapsed = time.perf_counter() - start
print(f"{elapsed:0.3f}s = native python list to sum {count} elements = {sum}")

start = time.perf_counter()
count = array.size
sum   = np.sum(array)
elapsed = time.perf_counter() - start
print(f"{elapsed:0.3f}s = numpy array to sum {count} elements = {sum}")

3.114s = native python list initialization
0.603s = numpy list initialization for shape (1000000, 5)
1.388s = native python list to sum 5000000 elements = 15000000
0.020s = numpy array to sum 5000000 elements = 15000000

CodePudding user response:

Reproducing your code, but with a much smaller range so we can actually look at the lists and arrays:

In [2]: alist = [[num   1 for num in range(5)] for lst in range(10)]
In [3]: alist
Out[3]: 
[[1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5]]

While we could make an array from that, np.array(alist), we can make one by combining a 5 element array with a "vertical" 10 element one:

In [4]: arr = np.arange(1,6) np.zeros((10,1),int)
In [5]: arr
Out[5]: 
array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5]])

Your count loop:

In [6]: count = 0
   ...: for lst in alist:
   ...:     for num in lst:
   ...:         count  = 1
   ...: 
In [7]: count
Out[7]: 50

And its time - here I use timeit which repeats the run and gets a average time. In an ipython session it's very easy to use:

In [8]: %%timeit
   ...: count = 0
   ...: for lst in alist:
   ...:     for num in lst:
   ...:         count  = 1
   ...: 
2.33 µs ± 0.833 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Similar iteration on the 2d array - significantly slower:

In [9]: %%timeit
   ...: count = 0
   ...: for lst in arr:
   ...:     for num in lst:
   ...:         count  = 1
   ...: 
18.1 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

nditer is still slower than the list loop. Usually nditer is slower than regular iteration. Here's it's relatively fast because is isn't doing anything with the num variable. So this isn't a good test of its performance.

In [10]: %%timeit
    ...: count = 0
    ...: for num in numpy.nditer(arr):
    ...:     count  = 1
    ...: 
7 µs ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

But if we use the array as intended, we get something much better (and ore so with a bigger arr.

In [11]: np.count_nonzero(arr)
Out[11]: 50
In [12]: timeit np.count_nonzero(arr)

960 ns ± 2.24 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Another way - it's not so good with this small array, but I expect it will scale better than the list loop:

In [17]: timeit (arr>0).sum()
10.2 µs ± 32.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In sum - numpy can be faster, if used right. But don't try to imitate python list methods with it.

  • Related