I have to retype most of this by hand since the system I'm testing on can't currently connect to the internet, forgive any obvious typos please.
We are programmatically scheduling deployments into a large sandbox with 19 nodes, 16 of which are workers. Usually we scan through available nodes to find the ones with the most available memory/cpu and select it for the new deployment, though given the affinity below I'm wondering if this particular deployment is being deployed through some other part of our code somehow since it has no nodeAffinity at all.
Either way usually deployment works, but occasionally a pod will fail to schedule
0/19 nodes are available: 16 node(s) didn't match pod affinity rules, 16 node(s) didn't match pod affinity/anti-affinity, 3 node(s) had taint (node-role.kubernetes.io/controlplane: true), that the pod didn't tolerate
I've used kubectl to look up the pods affinities after they are created. We have multiple nearly identical pods, both the ones that can be scheduled and the one that can't appear to have identical affinities:
"podAffinity": {
"requiredDuringSchedulingIgnoreDuringExecution": [
{
"labelSelector": {
"matchExpressions: " [
{
"key": "app.kubernetes.io/instance",
"operator": "In",
"values": [
<instance name>
]
},
{
"key": "host",
"operator": "In",
"values": [
"yes"
]
}
]
},
"topologyKey": "kubernetes.io/hostname"
}
]
}
I get this by looking at spec.affinity:
kubectl get pods <pod_name> -o json | jq '.spec.affinity'
I thought I understood affinity, but clearly not because I can't find any 'host' label on the pod or the node. I also don't understand why the pod affinity would prevent the pod from being scheduled on a node.
More importantly I don't understand what a host of "yes" means. It's not literally looking for a label with a value of "yes" is it?
Since I don't understand how the affinity works when assigning a functional pod I really don't understand why the same affinity occasionally fails. I'd appreciate any help in understanding what the affinity is actually doing or why it may occasionally fail.
CodePudding user response:
It's about pod affinity, not node affinity. So the labels are expected to be on running pods.
To schedule the pod, your code requires (requiredDuringSchedulingIgnoreDuringExecution
) that there is already a pod running on the node ("topologyKey": "kubernetes.io/hostname"
) that has matching labels
apiVersion: v1
kind: Pod
metadata:
name: foo
labels:
"app.kubernetes.io/instance": <instance-name>
host: yes
If such a pod is not running on one of your worker nodes, then your pod can't be scheduled.
CodePudding user response:
You should be using nodeAffinity ("schedule me on a node I prefer") instead of podAffinity ("colocate me on a node with a particular pod I prefer").
Node Affinity configuration for your use case would look something like this under pod.spec.affinity:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "<INSTANCE HOSTNAME>"
I do caution you on using this approach, though. Forcing your pod to specific nodes can be problematic (e.g. the scheduler may not be able to resolve other affinity constraints or handle taints).
It may also be unnecessary. By default, the kubernetes scheduler defaults to schedule workloads on the least allocated node.
NodeResourcesFit
is a scheduler plugin which ranks nodes based on resources available and pod requirements. It defaults to LeastAllocated
.