Home > OS >  Asyncio gather difference
Asyncio gather difference

Time:10-11

From my understanding, both the code blocks are doing the same thing. Why there is a difference in execution time?

import asyncio
import time
...

# Block 1:
start_time = time.time()
tasks = [
    get_from_knowledge_v2(...),
    get_from_knowledge_v2(...),
    get_from_knowledge_v2(...),
]
data_list = await asyncio.gather(*tasks)
print("TIME TAKEN::", time.time() - start_time)

# Block 2:
start_time = time.time()
data1 = await get_from_knowledge_v2(...)
data2 = await get_from_knowledge_v2(...)
data3 = await get_from_knowledge_v2(...)
print("WITHOUT ASYNCIO GATHER TIME TAKEN::", time.time() - start_time)

Result:

TIME TAKEN:: 0.6016566753387451
WITHOUT ASYNCIO GATHER TIME TAKEN:: 1.7620849609375

CodePudding user response:

The asyncio.gather function runs the awaitables you pass to it concurrently. That means, if I/O is happening in at least one of them that allows for useful context switches by the event loop. That in turn leads to a certain degree of parallelism.

In this case I assume that get_from_knowledge_v2 does some HTTP request in a way that supports asynchronous execution.

In the second code block you have no concurrency between the three get_from_knowledge_v2 calls. Instead you just execute them sequentially (with respect to each other). In other words, while you are awaiting the first one of them, the second one will not start. Their context is blocked.

Note: This does not mean that outside of that code block no concurrency is happening/possible. If this sequential code block is inside an async function (i.e. coroutine), you can execute that concurrently with some other coroutine. It is just that inside that code block, those get_from_knowledge_v2 coroutines are executed sequentially.

The time you measured confirms this rather nicely since you have three coroutines and gather allows them to be executed almost in parallel, while the other code block executes them sequentially, thus leading to an almost three times longer execution time.

PS

Maybe a minimal concrete example will help illustrate what I mean:

from asyncio import gather, run, sleep
from time import time


async def sleep_and_print(seconds: float) -> None:
    await sleep(seconds)
    print("slept", seconds, "seconds")


async def concurrent_sleeps() -> None:
    await gather(
        sleep_and_print(3),
        sleep_and_print(2),
        sleep_and_print(1),
    )


async def sequential_sleeps() -> None:
    await sleep_and_print(3)
    await sleep_and_print(2)
    await sleep_and_print(1)


async def something_else() -> None:
    print("Doing something else that takes 4 seconds...")
    await sleep(4)
    print("Done with something else!")


async def main() -> None:
    start = time()
    await concurrent_sleeps()
    print("concurrent_sleeps took", round(time() - start, 1), "seconds\n")

    start = time()
    await sequential_sleeps()
    print("sequential_sleeps took", round(time() - start, 1), "seconds\n")

    start = time()
    await gather(
        sequential_sleeps(),
        something_else(),
    )
    print("sequential_sleeps & something_else together took", round(time() - start, 1), "seconds")


if __name__ == '__main__':
    run(main())

Running that script gives the following output:

slept 1 seconds
slept 2 seconds
slept 3 seconds
concurrent_sleeps took 3.0 seconds

slept 3 seconds
slept 2 seconds
slept 1 seconds
sequential_sleeps took 6.0 seconds

Doing something else that takes 4 seconds...
slept 3 seconds
Done with something else!
slept 2 seconds
slept 1 seconds
sequential_sleeps & something_else together took 6.0 seconds

This illustrates that the sleeping was done almost in parallel inside concurrent_sleeps, with the 1 second sleep finishing first, then the 2 second sleep, then the 3 second sleep.

It shows that the sleeping is done sequentially inside sequential_sleeps and in the call order, meaning it first slept 3 seconds, then it slept 2 seconds, then 1 second.

And finally, executing sequential_sleeps concurrently with something_else shows that they are executed almost in parallel, with the 3-second-sleep finishing first (after 3 seconds), then one second later something_else finished, then another second later the 2-second-sleep, then after another second the 1-second-sleep. Together they still took approximately 6 seconds.

That last part is what I meant, when I said you an still execute another coroutine concurrently with the sequential block of code. In itself, the code block will still always remain sequential.

I hope this is clearer now.

PPS

Just to throw another option into the mix, you can also achieve concurrency by using Tasks. Calling asyncio.create_task will immediately schedule the coroutine for execution on the event loop. The task it creates should be awaited at some point, but the underlying coroutine will start running almost immediately after calling create_task. You can add this to the example script above:

from asyncio import create_task
...
async def task_sleeps() -> None:
    t3 = create_task(sleep_and_print(3))
    t2 = create_task(sleep_and_print(2))
    t1 = create_task(sleep_and_print(1))
    await t3
    await t2
    await t1

async def main() -> None:
    ...
    start = time()
    await task_sleeps()
    print("task_sleeps took", round(time() - start, 1), "seconds\n")

And you'll see the following again:

...
slept 1 seconds
slept 2 seconds
slept 3 seconds
task_sleeps took 3.0 seconds

Tasks are a nice option to decouple the execution of some coroutine from its surrounding context to an extent, but you need to keep track of them in some way.

  • Related