I have a FastAPI running on a EC2 AWS instance. My endpoints work well, but I have a problem dealing with requests running at the same time with one endpoint.
This endpoint takes around 10s to complete, and uses a lot the cpu (it's about neural network processes). I have ran up to 5 calls simultaneous to this endpoint with no errors, and all the tasks complete around the 10s. However, if I go for a sixth, the system starts to fail. I start obtaning two errors:
"Network error communicating with endpoint"
"Endpoint request timed out"
Then, the EC2 instance is even not accessible though ssh, although it says it's "avalaible" in the EC2 panel. Any idea on how I can solve this problem? Maybe limiting to 90% of the CPU power by the API?
To communicate to this EC2 instance I send the information through an API Gateway. There're no more connections than mine, since I'm the only one accesing to it.
NOTE: If I do top I see that for 5 calls the CPU is around 380% (it has 4 cpu), but ram goes up to 83%. I guess this is a problem with my RAM usage?
CodePudding user response:
Thank you for the comments. I found that the models I was using with torch were saving gradients in memory, so I had to run it with:
with torch.no_grad():
In case you are interested, you can know and limit the RAM of your python app following https://stackoverflow.com/a/41125461/1200914. You can also look at the size of the variables using https://stackoverflow.com/a/24455637/1200914. However, I could only notice memory leaks with the first link.
Once RAM was stabilized, I could not get more these errors, although now it's time to create a queu due to CPU workload...