Home > Enterprise >  can anyone explain why this async function giving error in python?? I'm newbie to async in pyth
can anyone explain why this async function giving error in python?? I'm newbie to async in pyth

Time:09-21

import asyncio
import time
from time import sleep

async def source_scraper(index):
    # sleep(1)
    await asyncio.sleep(1)
    # print(index)

async def source_scraper_head_sub():
    loop = asyncio.get_event_loop()
    links = [i for i in range(10_000)]
    start = time.time()
    tasks = [loop.run_in_executor(None, source_scraper, index) for index in enumerate(links)]
    await asyncio.gather(*tasks)
    print("time taken =", time.time() - start)

def runner():
    asyncio.run(source_scraper_head_sub())

runner()

Output:

time taken = 0.4620068073272705
C:\Apps\Python\lib\asyncio\base_events.py:1897: RuntimeWarning: coroutine 'source_scraper' was never awaited
  handle = None  # Needed to break cycles when an exception occurs.
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

CodePudding user response:

from multiprocessing import Pool
from time import time, sleep

def source_scraper(index):
    sleep(1)
    return index

def source_scraper_head_sub():
    links = [i for i in range(100)]
    start = time()
    with Pool() as p:
        tasks = p.map(source_scraper, links)
    print(f"time taken = {time() - start:.2f}")
    return tasks

if __name__ == '__main__':
    data = source_scraper_head_sub()
    print(data[-5:])

Output:

time taken = 12.13
[95, 96, 97, 98, 99]

CodePudding user response:

You are mixing low-level event loop API with high-level async-await syntax.

loop.run_in_executor regards its argument as normal functions and simply calls them.

Conceptually, an async def is a callable that returns a coroutine. So if you pass it to run_in_executor it gets called, the result (the coroutine) is discarded and never awaited, hence the warning. On the other hand, asyncio.gather accepts a list of coroutines, so you can directly pass the result of calling source_scraper to it, without mentioning loop.

To quote asyncio docs:

Application developers should typically use the high-level asyncio functions, such as asyncio.run(), and should rarely need to reference the loop object or call its methods.

With async/await (recommended)

Remove the call to run_in_executor and change the line tasks = [...] into

tasks = [source_scraper(index) for index in enumerate(links)]

and everything runs in 1 second thanks to the non-blocking nature of asyncio.sleep.

With event loop API

If for some reason you want to use the event loop, change source_scraper to a normal function

def source_scraper(index):
    sleep(1)
    print(index)

and keep everything else. Beware this has different behavior than the previous approach. time.sleep is blocking, and if the pool is not big enough --- each thread will have to handle more than one task sequentially, resulting longer running time.

  • Related