Home > front end >  Schedule Kubernetes pods in the same failure zone
Schedule Kubernetes pods in the same failure zone

Time:11-04

We have a deployment with a large replicas number ( > 1 ) that we must deploy in the same zone.

We stumbled upon this documentation section: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#an-example-of-a-pod-that-uses-pod-affinity

which explains how to schedule pods in zones that already have other pods that match certain labels.

however, there are no other pods that our deployment depends upon. all other workloads are replicated and spread across multiple zones, and this is the first deployment that we would like to keep in a single zone.

also, we thought about explicitly setting the zone for this deployment, but in case of zone failure, it will become unavailable until we notice and explicitly set it to another zone. so setting the exact zone won't work here.

any insights here? and thanks!

CodePudding user response:

Pod Affinity affects how the pod is scheduled based on the presence or absence of other pods within the node. That would probably not serve the purpose you're trying to achieve.

You're probably better off using node affinity (it's on the same link you provided)

That would allow you to force to a zone, because each GKE node will have a failure-domain label which you can get doing this and looking through the results:

kubectl get node {name-of-node} -o json | jq ".metadata.labels"

The labels will read something like this:

  "failure-domain.beta.kubernetes.io/region": "europe-west2",
  "failure-domain.beta.kubernetes.io/zone": "europe-west2-b",

You can then combine this with nodeAffinity in your deployment yaml (parts snipped for brevity):

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    ...
  annotations:
    ...
  name: my-deployment
spec:
  replicas: 1
  strategy: {}
  selector:
    matchLabels:
      ...
  template:
    metadata:
      annotations:
        ...
      labels:
        ...
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: failure-domain.beta.kubernetes.io/zone
                operator: In
                values:
                - europe-west2-b

This will force the pods generated by the deployment to all go onto nodes sitting in europe-west2-b

I could change this and make it like this:

    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: failure-domain.beta.kubernetes.io/zone
                operator: In
                values:
                - europe-west2-b
                - europe-west2-c

To allow it schedule in two zones (but it would not be able to schedule on to the europe-west2-a zone as a consequence)

CodePudding user response:

I do not think there is a direct way to achieve this. I can think of two ways this can work.

Using Affinity on Pod and Node

  1. Adding node affinity with preferredDuringSchedulingIgnoredDuringExecution for the regions you would want to target.
  2. Adding pod affinity to itself with preferredDuringSchedulingIgnoredDuringExecution for pods to prefer to be with each other.

With this what should happen is when the first pod is about to be spun up it would match none of its preferred affinity but the scheduler will still schedule it. But once one is running for the rest of the pod there will be a pod with the correct affinity and they should all spin up. The challenge is there is a possibility of a race condition where multiple pods try to get scheduled and scheduler puts them in different locations once your first preferred zone is out.

Using Webhooks

  1. You can use some mutating webhook to check the node label and add requiredDuringSchedulingIgnoredDuringExecution affinity to pods based on what zones you have still available.

The challenge here is you would most likely need to write and maintain this webhook yourself. I am not sure if you will find your exact usecase solved by someone else in open source. A quick search shows me this repo. I have not tested this but might give you a start.

  • Related