Limiting the number of times an endpoint of Kubernetes pod can be accessed?-CodePudding

I have a machine learning model inside a docker image. I pushed the docker image to google container registry and then deploy it inside a Kubernetes pod. There is a fastapi application that runs on Port 8000 and this Fastapi endpoint is public (call it mymodel:8000).

The structure of fastapi is :

app.get("/homepage")
asynd def get_homepage()

app.get("/model):
aysnc def get_modelpage()

app.post("/model"):
async def get_results(query: Form(...))

User can put query and submit them and get results from the machine learning model running inside the docker. I want to limit the number of times a query can be made by all the users combined. So if the query limit is 100, all the users combined can make only 100 queries in total.

I thought of a way to do this:

Store a database that stores the number of times GET and POST method has been called. As soon as the total number of times POST has been called crosses the limit, stop accepting any more queries.

Is there an alternative way of doing this using Kubernetes limits? Such as I can define a limit_api_calls such that the total number of times mymodel:8000 is accessed is at max equal to limit_api_calls.

I looked at the documentation and I could only find setting limits for CPUs, Memory and rateLimits.

CodePudding user response：

There are several aproaches that could satisfy your needs.

Custom implementation: As you mentioned, keep in a persistance layer the number of API calls received and deny requests after it has been reached.
Use a service mesh: Istio (for instance) will let you limit the number of requests received and act as a circuit breaker.
Use an external Api Manager: Apigee will also let you limit and even charge your users, however if it is only for internal use (not pay per use) I definetely won't recommend it.

The tricky part is what you want to happen after the limit has been reached, if it is just a pod you may exit the application to finish and clear it.

Otherwise, if you have a deployment with its replicaset and several resources associated with it (like configmaps), you probably want to use some kind of asynchronous alert or polling check to clean up everything related to your deployment. You may want to have a deep look to orchestrators like Airflow (Composer) and use several tools such as Helm for keeping deployments easy.