I don't understand how replication works in Kubernetes.
I understand that two replicas on different nodes will provide fault tolerance for the application, but I don’t understand this:
Suppose the application is given the following resources:
---
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: app
image: images.my-company.example/app:v4
resources:
requests:
memory: "1G"
cpu: "1"
limits:
memory: "1G"
cpu: "1"
The application has two replicas. Thus, in total, 2 CPUs and 2G RAM are available for applications.
But what happens if the application receives a request with a size of 1.75G? After all, only 1G RAM is available in one replica. Will the request be distributed among all replicas?
Answer for Harsh Manvar
Maybe you misunderstood me?
What you explained is not entirely true.
Here is a real, working deployment of four replicas:
$ kubectl get deployment dev-phd-graphql-server-01-master-deployment
NAME READY UP-TO-DATE AVAILABLE AGE
dev-phd-graphql-server-01-master-deployment 4/4 4 4 6d15h
$ kubectl describe deployment dev-phd-graphql-server-01-master-deployment
...
Limits:
cpu: 2
memory: 4G
Requests:
cpu: 2
memory: 4G
...
CodePudding user response:
No, it won't get distributed one replica will start simply and the other will stay in pending state.
If you will describe that pending POD(replica) it show this error :
0/1 nodes available: insufficient cpu, insufficient memory
kubectl describe pod POD-name
K8s will check for the requested resource
requests:
memory: "1G"
cpu: "1"
if mentioned minimum requested resources available it will deploy the replica and other will goes in pending state.
Update
But what happens if the application receives a request with a size of 1.75G? After all, only 1G RAM is available in one replica.
requests:
memory: "1G"
cpu: "1"
limits:
memory: "1G"
cpu: "1"
If you have a set request of 1 GB and application start using the 1.75 GB it will kill or restart the POD due to hitting the limit.
But yes in some cases container might can exceeds the limit if Node has memory available.
A Container can exceed its memory request if the Node has memory available. But a Container is not allowed to use more than its memory limit. If a Container allocates more memory than its limit, the Container becomes a candidate for termination. If the Container continues to consume memory beyond its limit, the Container is terminated. If a terminated Container can be restarted, the kubelet restarts it, as with any other type of runtime failure.
Read more at : https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#exceed-a-container-s-memory-limit
You might would like to read this also : https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource-limits-are-run
CodePudding user response:
See for these kinds of circumstances you need to have an idea of how large of a request it can get and accordingly setup your resource request and limits.
If you feel there can be a request as big as 1.75GB you have tackle it in your source code.
For example you might have a conversion job which takes a lot of resources. You can make it a celery task and host the celery worker in another node group which is made for large tasks (A AWS t3.xlarge for example)
Anyways such large tasks will not generate a result immediately so I don't see a problem in running them asynchronously and giving back the result later maybe even in a websocket message. This will keep you main server from getting clumped up and also will help you to efficiently scale your large tasks