import asyncio
import time
from time import sleep
async def source_scraper(index):
# sleep(1)
await asyncio.sleep(1)
# print(index)
async def source_scraper_head_sub():
loop = asyncio.get_event_loop()
links = [i for i in range(10_000)]
start = time.time()
tasks = [loop.run_in_executor(None, source_scraper, index) for index in enumerate(links)]
await asyncio.gather(*tasks)
print("time taken =", time.time() - start)
def runner():
asyncio.run(source_scraper_head_sub())
runner()
Output:
time taken = 0.4620068073272705
C:\Apps\Python\lib\asyncio\base_events.py:1897: RuntimeWarning: coroutine 'source_scraper' was never awaited
handle = None # Needed to break cycles when an exception occurs.
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
CodePudding user response:
from multiprocessing import Pool
from time import time, sleep
def source_scraper(index):
sleep(1)
return index
def source_scraper_head_sub():
links = [i for i in range(100)]
start = time()
with Pool() as p:
tasks = p.map(source_scraper, links)
print(f"time taken = {time() - start:.2f}")
return tasks
if __name__ == '__main__':
data = source_scraper_head_sub()
print(data[-5:])
Output:
time taken = 12.13
[95, 96, 97, 98, 99]
CodePudding user response:
You are mixing low-level event loop API with high-level async
-await
syntax.
loop.run_in_executor
regards its argument as normal functions and simply calls them.
Conceptually, an async def
is a callable that returns a coroutine. So if you pass it to run_in_executor
it gets called, the result (the coroutine) is discarded and never awaited, hence the warning. On the other hand, asyncio.gather
accepts a list of coroutines, so you can directly pass the result of calling source_scraper
to it, without mentioning loop
.
To quote asyncio docs:
Application developers should typically use the high-level asyncio functions, such as asyncio.run(), and should rarely need to reference the loop object or call its methods.
With async/await (recommended)
Remove the call to run_in_executor
and change the line tasks = [...]
into
tasks = [source_scraper(index) for index in enumerate(links)]
and everything runs in 1 second thanks to the non-blocking nature of asyncio.sleep
.
With event loop API
If for some reason you want to use the event loop, change source_scraper
to a normal function
def source_scraper(index):
sleep(1)
print(index)
and keep everything else. Beware this has different behavior than the previous approach. time.sleep
is blocking, and if the pool is not big enough --- each thread will have to handle more than one task sequentially, resulting longer running time.