We have 4 nodes EKS cluster. We have some pods (part of Daemonset) in pending status as the Node is full and there is no capacity in the node to run the pod. The question is do we need to manually reshuffle the workloads to make Daemonset pods running in this situation or is there any configuration to overcome this issue in an automated fashion?
Note: we have also installed Cluster Autoscaler which works perfectly for deployments.
Thank you in advance.
CodePudding user response:
Kubernetes has pod priorities and preemption for this specific purpose.
Pods can have priority. Priority indicates the importance of a Pod relative to other Pods. If a Pod cannot be scheduled, the scheduler tries to preempt (evict) lower priority Pods to make scheduling of the pending Pod possible.
ref: https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/
If EKS does not have priority classes pre-configured, you can create one yourself. For example, the one from the docs which is a preempting one:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for XYZ service pods only."
Then you use that class on your daemon set
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
priorityClassName: high-priority # this is important
Note this is just a small example copied from linked docs, you should read the docs carefully and perhaps also review how this would interact with pod disruption budgets.
Also note, that this may cause disruption to other deployments, depending on various factors such as the Update Strategy. So, be careful.
CodePudding user response:
As those pods are part of a Daemonset they are expected to be scheduled on every node attached to the cluster, which means that you have to make space for the pods on the node they are failing.
If you have written that daemonset on your own you can specify
.spec.template.spec.nodeSelector
, then the DaemonSet controller will create Pods on nodes that match that node selector. Likewise, if you specify a .spec.template.spec.affinity
, then DaemonSet controller will create Pods on nodes that match that node affinity. If you do not specify either, then the DaemonSet controller will create Pods on all nodes as per official documentation. Or you can leverage if the daemonset (third party written) already support any of the scheduling.
You can also think about increasing the Node size aka instance type for the Node group but has to be careful with that as nodes are immutable and have to be replaced with a new instance type or with a new Node group. For a complete answer on updating Node instance type refer here