FastAPI runs api-calls in serial instead of parallel fashion-CodePudding

I have the following code:

import time
from fastapi import FastAPI, Request
    
app = FastAPI()
    
@app.get("/ping")
async def ping(request: Request):
        print("Hello")
        time.sleep(5)
        print("bye")
        return {"ping": "pong!"}

If I run my code in my local server, e.g., http://localhost:8501/ping, in different tabs of the same Firefox windows I get:

    Hello
    bye
    Hello
    bye
    ...

Instead of:

    Hello
    Hello
    bye
    bye

I have read about using httpx, but still, I cannot have a true parallelization. What's the problem?

CodePudding user response：

As per FastAPI's documentation:

When you declare a path operation function with normal def instead of async def, it is run in an external threadpool that is then awaited, instead of being called directly (as it would block the server).

Thus, def (sync) routes run in a separate thread from a threadpool, or, in other words, the server processes the requests concurrently, whereas async def routes run on the main (single) thread, i.e., the server processes the requests sequentially (as long as there is no await call inside such a route. If await is present, the execution internal to that async route is blocked/paused; however, await does not block anything outside of that route. Asynchronous code with async and await is many times summarised as using coroutines). This also means that a blocking operation, such as time.sleep(), in an async def route will block the entire server (as in your case).

Thus, if your function is not going to make any async calls, you shall use def keyword instead, as shown below:

@app.get("/ping")
def ping(request: Request):
    #print(request.client)
    print("Hello")
    time.sleep(5)
    print("bye")
    return {"ping": "pong!"}

Otherwise, if you are going to call async functions that you have to await, you shall use async def keyword. To demonstrate this, the below uses asyncio.sleep() function from the asyncio library. Similar example is given here as well (you could also take a look at this answer).

import asyncio
 
@app.get("/ping")
async def ping(request: Request):
    print("Hello")
    await asyncio.sleep(5)
    print("bye")
    return {"ping": "pong!"}

Both the above will print the expected output, as mentioned in your question.

Note: When you call your endpoint for the second (third, and so on) time, please remember to do that from a tab that is isolated from the browser's main session, otherwise, the requests are shown as coming from the same client (you could check that using print(request.client) - port number would appear being the same, if both tabs are opened in the same window), and hence, the requests are processed sequentially. You could either reload the same tab (as is running), or open a new tab in an incognito window, or use another browser/client to send the request.

If you are required to use async def keyword (as you might need to await for coroutines), but also have some synchronous long computation task that might be blocking the server and doesn't let other requests to go through, you would need to use more workers (e.g., uvicorn main:app --workers 4), or explore other solutions, as described in this answer.

CodePudding user response：

Q :
_{" ... What's the problem? "}

A :
The FastAPI documentation is explicit to say the framework uses in-process tasks ( as inherited from Starlette ).

That, by itself, means, that all such task compete to receive ( from time to time ) the Python Interpreter GIL-lock - being efficiently a MUTEX-terrorising Global Interpreter Lock, which in effect re-[SERIAL]-ises any and all amounts of Python Interpreter in-process threads
to work as one-and-only-one-WORKS-while-all-others-stay-waiting...

On fine-grain scale, you see the result -- if spawning another handler for the second ( manually initiated from a second FireFox-tab ) arriving http-request actually takes longer than a sleep has taken, the result of GIL-lock interleaved ~ 100 [ms] time-quanta round-robin ( all-wait-one-can-work ~ 100 [ms] before each next round of GIL-lock release-acquire-roulette takes place ) Python Interpreter internal work does not show more details, you may use more details ( depending on O/S type or version ) from here to see more in-thread LoD, like this inside the async-decorated code being performed :

import time
import threading
from   fastapi import FastAPI, Request

TEMPLATE = "INF[{0:_>20d}]: t_id( {1: >20d} ):: {2:}"

print( TEMPLATE.format( time.perf_counter_ns(),
                        threading.get_ident(),
                       "Python Interpreter __main__ was started ..."
                        )
...
@app.get("/ping")
async def ping( request: Request ):
        """                                __doc__
        [DOC-ME]
        ping( Request ):  a mock-up AS-IS function to yield
                          a CLI/GUI self-evidence of the order-of-execution
        RETURNS:          a JSON-alike decorated dict

        [TEST-ME]         ...
        """
        print( TEMPLATE.format( time.perf_counter_ns(),
                                threading.get_ident(),
                               "Hello..."
                                )
        #------------------------------------------------- actual blocking work
        time.sleep( 5 )
        #------------------------------------------------- actual blocking work
        print( TEMPLATE.format( time.perf_counter_ns(),
                                threading.get_ident(),
                               "...bye"
                                )
        return { "ping": "pong!" }

Last, but not least, do not hesitate to read more about all other sharks threads-based code may suffer from ... or even cause ... behind the curtains ...

Ad Memorandum :
a mixture of GIL-lock, thread-based pools, asynchronous decorators, blocking and event-handling -- a sure mix to uncertainties & HWY2HELL

;o)