Can I somehow speed up this simple code in python? That takes 0.221seconds for executing
for some_list in some_list_of_lists:
i = 1
while i < len(some_list):
a = some_list[i-1]
b = some_list[i]
i = 2
- some_list_of_lists ~110 000 items
- some_list ~10 items
Same code in cython takes only 0.004seconds, my goal is achieve 0.02seconds or less in python(without using cython)
full code:
result = []
for some_list in some_list_of_lists:
t1 = 1
t2 = 1
i = 1
while i < len(some_list):
a = some_list[i-1] #int
b = some_list[i] #int
t1 *= a
t2 *= b
i = 2
if t1 > t2:
result.append(some_list)
CodePudding user response:
The main problem here is that your implementation is very inefficient.
Compare:
from timeit import timeit
from random import random
import numpy as np # for a numpy solution
from math import prod # for a better Python solution
# just some random example data that is "some list of lists" (of numbers)
example_data = [
[random() * 10 for _ in range(100)]
for _ in range(100)
]
def f(some_list_of_lists):
# your code, in a function, so it's easy to time
result = []
for some_list in some_list_of_lists:
t1 = 1
t2 = 1
i = 1
while i < len(some_list):
a = some_list[i - 1] # int
b = some_list[i] # int
t1 *= a
t2 *= b
i = 2
if t1 > t2:
result.append(some_list)
return result
# the same data again, but in a numpy array
example_array = np.array(example_data)
def numpy_f(some_array):
# this function does exactly the same, it selects those 'lists' for which
# the product of the even elements is greater than the product of the odd ones
return some_array[
np.prod(some_array[:,::2], axis=1) > np.prod(some_array[:,1::2], axis=1)
]
def better_python_f(some_list_of_lists):
return [
xs for xs in some_list_of_lists if prod(xs[::2]) > prod(xs[1::2])
]
# showing that the results are the same
simple_example = [[2, 1, 1, 1], [1, 2, 1, 1], [2, 1, .5, 1], [.5, 1, 1, 1], [1, .5, 1, 1]]
print(f(simple_example))
print(numpy_f(np.array(simple_example)))
print(better_python_f(simple_example))
# timing running each 10,000 times on the large example data set
print('yours:', timeit(lambda: f(example_data), number=10000))
print('numpy:', timeit(lambda: numpy_f(example_array), number=10000))
print('mine :', timeit(lambda: better_python_f(example_data), number=10000))
Output:
[[2, 1, 1, 1], [1, 0.5, 1, 1]]
[[2. 1. 1. 1. ]
[1. 0.5 1. 1. ]]
[[2, 1, 1, 1], [1, 0.5, 1, 1]]
yours: 5.2597994999996445
numpy: 0.1987909999988915
mine : 0.7029579999998532
This shows that using just plain Python, without external libraries, you can easily get almost an order of magnitude faster. I'm sure someone can come up with an even faster method just in Python.
However, numpy
is perfect for this, and even the trivial solution I wrote is more than 25x faster than yours - and again I think someone more experienced in the application of numpy
can probably write an even faster solution using numpy
.
The key takeaway here shouldn't be "I need to use numpy because it's fast" though. The key thing is that you should think about what problem you're actually solving and then decide what the most efficient way is to do exactly that.
In your case, what the code actually does is "for a series of number series, decide if the product of numbers on even indices is greater than the product of numbers on odd incides for each number series, and keep only those where this is the case". To me, that quickly points at thinking about slicing, efficient ways to compute a product, and how to select and return the desired result.
Not only does the better_python_f
do exactly what it needs to, the code almost reads like a description of the problem:
[
xs for xs in some_list_of_lists if prod(xs[::2]) > prod(xs[1::2])
]
"A list of all xs
in the original list, where the product of even elements is greater than the product of odd elements."
Of course, it does require that you know about the Python concepts 'list comprehension' (x for x in .. if <condition on x>
) and 'slicing' (e.g. xs[::2]
), and that math
has a fairly efficient prod()
function to compute the product of elements of an iterable like a list.
CodePudding user response:
Threading and multiprocessing are both modules used to speed up functionality and get more done in less time. I'm not sure how much improvement you'll see in this example but these modules are great to know about in general.