Home > OS >  Fastapi scaleup multi-tennent application
Fastapi scaleup multi-tennent application

Time:07-02

I am trying to understand how to scale up Fastapi on our app. We have currently application developed like into snippet code bellow. So we dont use async calls. Our application is multi-tennent and we expect to load big requests (~10mbs) per requests.

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
def root():
    psycopg2 queries select ... Query last 2-3 minutes or ml model
    return {"message": "Hello World"}

When the API call is made another user is wating to start doing requests which is what we dont want. I can increase from 1 worker to 4-6 workers (guvicorn). So than 4-6 users can use app independently. Does it means that we can handle 4-6x workers more or is it less ?

We were thinking to change to async and uses async postgres drivers (asyncio) we could get more throughtput. I assume than will be database bottnlneck soon ? Also we did some performance testing and this approach would decrease time on half according to our tests.

How can we scale up our apllication further if we want in peak times handle 1000 users at same time ? What should we take into consideration ?

CodePudding user response:

First of all: Does this processing need to be sync? I mean, is the user waiting for the response of this processing that takes 2-3 minutes? It is not recommended that you have APIs that take that long to respond.

If your user doesn't need to wait until it finishes, you have a few options:

  1. You can use celery and make this processing async using a background tasks. Celery is commonly used for this kind of things where you have huge queries or huge processing that takes a while and that can be done async.
  2. You can also use the background task from FastAPI that allows you to run things on background.

If we do it this way you will be able to easily scale your application. Note that celery currently doesn't support async, so you would not be able to use async there unless you implement a few tweaks yourself.

About scaling the number of workers - FastAPI recommends that you use your container structure to manage the number of replicas running, so instead of having gunicorn, you could simply scale the number of replicas of your service. If you are not using containers, then you can use a structure from gunicorn that allows you to automatically spins up new workers based on the number of requests that you are receiving.

If none of my answers above make sense for you, I'd suggest:

  1. Use the async driver from Postgres so while it is running and processing your query FastAPI will be able to receive requests from other users. Note that if your query is huge, you might need a lot of memory to do what you are saying.
  2. Create some sort of auto scaling based on response time/requests per second so you can scale your application as you receive more requests
  • Related