Home > front end >  asyncio wait - process results as they come
asyncio wait - process results as they come

Time:04-22

This script should take a list of initial tasks (URLs) and asynchronously make requests with aiohttp. And this part is done correctly. The problem is, since asyncio wait doesn't return actual results but only done/pending task set, I cant figure out where and how to process the results as they come, to make more requests and write data to DB. In this variant I placed the creation for a new task (make more requests...) inside the first one, which doesn't work. PS. I am using wait because a book I am reading suggests using wait for more control over done and pending tasks and exceptions. Appreciate any help:)

async def fetch_content_2(session, url):
    async with session.get(url) as result:
        res = await result.text()
        try:
            new_link = BeautifulSoup(res, 'lxml').select_one('element on website 2')['href'])
            # ***PROCESS AND WRITE SOME DATA TO DB***
        except:
            pass

async def fetch_content_1(session, url):
    async with session.get(url) as result:
        res = await result.text()
        try:
            link = BeautifulSoup(res, 'lxml').select_one('element on website 1')['href'])
            # ***MAKE ANOTHER ASYNC REQUEST WITH NEW LINK***
            asyncio.create_task(fetch_content_1(session,link))
        except:
            pass

async def main(tasks):
    async with ClientSession() as session:
        pending = [asyncio.create_task(fetch_content_1(session, url)) for url in tasks]
        while pending:
            done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)

            # print(f'Done count: {len(done)}')
            # print(f'Pending count: {len(pending)}')

asyncio.run(main([url1, url2, ...]))

            

CodePudding user response:

done and pending are sets of asyncio.Task objects. If you want to get the result of the task or its state you must get the values of the sets and call the method you need, check the (docs). Specifically you can get the result invoking the result method.

async def main(tasks):
    async with ClientSession() as session:
        pending = [asyncio.create_task(fetch_content_1(session, url)) for url in tasks]
        while pending:
            done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
            res = done.pop().result()
            # do some stuff with the result

Check the documentation to see the possible exceptions of call the result method and related methods. A exception may occur if the task had an internal error or the result is not ready (in this case shouldn't happen).

  • Related