can anyone explain why this async function giving error in python?? I'm newbie to async in pyth-CodePudding

import asyncio
import time
from time import sleep

async def source_scraper(index):
    # sleep(1)
    await asyncio.sleep(1)
    # print(index)

async def source_scraper_head_sub():
    loop = asyncio.get_event_loop()
    links = [i for i in range(10_000)]
    start = time.time()
    tasks = [loop.run_in_executor(None, source_scraper, index) for index in enumerate(links)]
    await asyncio.gather(*tasks)
    print("time taken =", time.time() - start)

def runner():
    asyncio.run(source_scraper_head_sub())

runner()

Output:

time taken = 0.4620068073272705
C:\Apps\Python\lib\asyncio\base_events.py:1897: RuntimeWarning: coroutine 'source_scraper' was never awaited
  handle = None  # Needed to break cycles when an exception occurs.
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

CodePudding user response：

from multiprocessing import Pool
from time import time, sleep

def source_scraper(index):
    sleep(1)
    return index

def source_scraper_head_sub():
    links = [i for i in range(100)]
    start = time()
    with Pool() as p:
        tasks = p.map(source_scraper, links)
    print(f"time taken = {time() - start:.2f}")
    return tasks

if __name__ == '__main__':
    data = source_scraper_head_sub()
    print(data[-5:])

Output:

time taken = 12.13
[95, 96, 97, 98, 99]

CodePudding user response：

You are mixing low-level event loop API with high-level async-await syntax.

loop.run_in_executor regards its argument as normal functions and simply calls them.

Conceptually, an async def is a callable that returns a coroutine. So if you pass it to run_in_executor it gets called, the result (the coroutine) is discarded and never awaited, hence the warning. On the other hand, asyncio.gather accepts a list of coroutines, so you can directly pass the result of calling source_scraper to it, without mentioning loop.

To quote asyncio docs:

Application developers should typically use the high-level asyncio functions, such as asyncio.run(), and should rarely need to reference the loop object or call its methods.

With async/await (recommended)

Remove the call to run_in_executor and change the line tasks = [...] into

tasks = [source_scraper(index) for index in enumerate(links)]

and everything runs in 1 second thanks to the non-blocking nature of asyncio.sleep.

With event loop API

If for some reason you want to use the event loop, change source_scraper to a normal function

def source_scraper(index):
    sleep(1)
    print(index)

and keep everything else. Beware this has different behavior than the previous approach. time.sleep is blocking, and if the pool is not big enough --- each thread will have to handle more than one task sequentially, resulting longer running time.