When comparing usage of Python Generators vs List for better performance/ optimisation, i read that Generators are faster to create than list but iterating over list is faster than generator. But I coded an example to test it with small and big sample of data and it contradicts with one another.
When I test speed for iterating over generator and list using 1_000_000_000 where the actual generator will have 500,000,000 numbers. I see the result where Generator iteration is faster than list
from time import time
my_generator = (i for i in range(1_000_000_000) if i % 2 == 0)
start = time()
for i in my_generator:
pass
print("Time for Generator iteration - ", time() - start)
my_list = [i for i in range(1_000_000_000) if i % 2 == 0]
start = time()
for i in my_list:
pass
print("Time for List iteration - ", time() - start)
And the output is:
Time for Generator iteration - 67.49345350265503 Time for List iteration - 89.21837282180786
But if i use small chunk of data 10_000_000 instead of 1_000_000_000 in input, List iteration is faster than Generator.
from time import time
my_generator = (i for i in range(10_000_000) if i % 2 == 0)
start = time()
for i in my_generator:
pass
print("Time for Generator iteration - ", time() - start)
my_list = [i for i in range(10_000_000) if i % 2 == 0]
start = time()
for i in my_list:
pass
print("Time for list iteration - ", time() - start)
The output is:
Time for Generator iteration - 1.0233261585235596 Time for list iteration - 0.11701655387878418
Why is behaviour happening?
CodePudding user response:
After understanding points made by @gimix and @Dani Mesejo, I found the answer. Indeed list iteration is faster than generator iteration
In case of generator, a generator is called like a function call for each iteration we are also calling reminder operation (modulus)for each iteration as it makes it even slower for each call...Whereas in case of list it is calculated during creation itself and iteration is faster. Thus creation of list might be slower than creation of generator but iteration of list is definitely faster than list
The above code uses time
module which is not reliable!!
Now I used timeit for 1_000_000 and for 1_000_000_000 data and in both cases list iteration was faster :
import timeit
mysetup = '''my_generator = (i for i in range(10_000_000) if i % 2 == 0)
'''
mycode = '''
for i in my_generator:
pass
'''
mysetup1 = '''my_list = [i for i in range(10_000_000) if i % 2 == 0]'''
mycode1 = '''
for i in my_list:
pass
'''
print (timeit.timeit(setup = mysetup,
stmt = mycode,
number = 1))
print (timeit.timeit(setup = mysetup1,
stmt = mycode1,
number = 1))