This script should take a list of initial tasks (URLs) and asynchronously make requests with aiohttp
. And this part is done correctly. The problem is, since asyncio wait
doesn't return actual results but only done/pending task set
, I cant figure out where and how to process the results as they come, to make more requests and write data to DB. In this variant I placed the creation for a new task
(make more requests...) inside the first one, which doesn't work.
PS. I am using wait
because a book I am reading suggests using wait
for more control over done and pending tasks and exceptions. Appreciate any help:)
async def fetch_content_2(session, url):
async with session.get(url) as result:
res = await result.text()
try:
new_link = BeautifulSoup(res, 'lxml').select_one('element on website 2')['href'])
# ***PROCESS AND WRITE SOME DATA TO DB***
except:
pass
async def fetch_content_1(session, url):
async with session.get(url) as result:
res = await result.text()
try:
link = BeautifulSoup(res, 'lxml').select_one('element on website 1')['href'])
# ***MAKE ANOTHER ASYNC REQUEST WITH NEW LINK***
asyncio.create_task(fetch_content_1(session,link))
except:
pass
async def main(tasks):
async with ClientSession() as session:
pending = [asyncio.create_task(fetch_content_1(session, url)) for url in tasks]
while pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
# print(f'Done count: {len(done)}')
# print(f'Pending count: {len(pending)}')
asyncio.run(main([url1, url2, ...]))
CodePudding user response:
done
and pending
are set
s of asyncio.Task
objects. If you want to get the result of the task or its state you must get the values of the sets and call the method you need, check the (docs). Specifically you can get the result invoking the result
method.
async def main(tasks):
async with ClientSession() as session:
pending = [asyncio.create_task(fetch_content_1(session, url)) for url in tasks]
while pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
res = done.pop().result()
# do some stuff with the result
Check the documentation to see the possible exceptions of call the result
method and related methods. A exception may occur if the task had an internal error or the result is not ready (in this case shouldn't happen).