With my team, we're currently building an API using FastAPI and we're really struggling to get good performances out of it once deployed to Kubernetes. We're using async calls as much as possible but overall sitting at ~8RPS / pod to stay under our SLA of P99 200ms.
For resources, we assign the following:
resources:
limits:
cpu: 1
memory: 800Mi
requests:
cpu: 600m
memory: 100Mi
Surprisingly, such performance drops don't occur when running load tests on the API running locally in a Docker container. There we easily get ~200RPS on a single container with 120ms latency at P99...
Would anyone have an idea of what could go wrong in there and where I could start looking to find the bottleneck?
Cheers!
CodePudding user response:
First, try to request at least 1 CPU for your API, because if there are no available CPUs on the node, the pod will only use the reserved amount of CPUs which is 600m, so if you have another application with requests cpu=400m for example, kubernetes will run both applications on the same cpu, with 60% of the time for the API and 40% for the second application. While docker uses 1 CPU (maybe more) in localhost.
If you are using Uvicorn with multiple workers, you can also increase CPU limits to or at least 2.
Resources:
limits:
processor: 2
memory: 800Mi
requests:
processor: 1
memory: 100Mi
Finally, there is a difference between your local machine CPUs and kubernetes cluster CPUs, if you want to get good performance, you can test better CPUs and choose the most suitable one in terms of cost.
CodePudding user response:
It finally appeared that our performance issues were caused by the non-usage of gunicorn
and only uvicorn
(even though FastAPI's author recommends against this in his documentation). On the other hand, Uvicorn authors are recommending the other way round in their docs, i.e, using gunicorn
. We followed that advice and our performances issues were gone.
As suggested by people in this thread, setting more CPUs in request of our PodSpec was also part of the solution.