How to achieve Automatic Rollback in Kubernetes?-CodePudding

Let's say I've a deployment. For some reason it's not responding after sometime. Is there any way to tell Kubernetes to rollback to previous version automatically on failure?

CodePudding user response：

You mentioned that:

I've a deployment. For some reason it's not responding after sometime.

In this case, you can use liveness and readiness probes:

The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

The above probes may prevent you from deploying a corrupted version, however liveness and readiness probes aren't able to rollback your Deployment to the previous version. There was a similar issue on Github, but I am not sure there will be any progress on this matter in the near future.

If you really want to automate the rollback process, below I will describe a solution that you may find helpful.

This solution requires running kubectl commands from within the Pod. In short, you can use a script to continuously monitor your Deployments, and when errors occur you can run kubectl rollout undo deployment DEPLOYMENT_NAME.

First, you need to decide how to find failed Deployments. As an example, I'll check Deployments that perform the update for more than 10s with the following command:
NOTE: You can use a different command depending on your need.

kubectl rollout status deployment ${deployment} --timeout=10s

To constantly monitor all Deployments in the default Namespace, we can create a Bash script:

#!/bin/bash

while true; do
    sleep 60
    deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
    echo "====== $(date) ======"
    for deployment in ${deployments}; do
        if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
            echo "Error: ${deployment} - rolling back!"
            kubectl rollout undo deployment ${deployment}
        else
            echo "Ok: ${deployment}"
        fi
    done
done

We want to run this script from inside the Pod, so I converted it to ConfigMap which will allow us to mount this script in a volume (see: Using ConfigMaps as files from a Pod):

$ cat check-script-configmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
  name: check-script
data:
  checkScript.sh: |
    #!/bin/bash

    while true; do
        sleep 60
        deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
        echo "====== $(date) ======"
        for deployment in ${deployments}; do
            if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
                echo "Error: ${deployment} - rolling back!"
                kubectl rollout undo deployment ${deployment}
            else
                echo "Ok: ${deployment}"
            fi
        done
    done        

$ kubectl apply -f check-script-configmap.yml
configmap/check-script created

I've created a separate deployment-checker ServiceAccount with the edit Role assigned and our Pod will run under this ServiceAccount:
NOTE: I've created a Deployment instead of a single Pod.

$ cat all-in-one.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: deployment-checker
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: deployment-checker-binding
subjects:
  - kind: ServiceAccount
    name: deployment-checker
    namespace: default
roleRef:
  kind: ClusterRole
  name: edit
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: deployment-checker
  name: deployment-checker
spec:
  selector:
    matchLabels:
      app: deployment-checker
  template:
    metadata:
      labels:
        app: deployment-checker
    spec:
      serviceAccountName: deployment-checker
      volumes:
        - name: check-script
          configMap:
            name: check-script
      containers:
      - image: bitnami/kubectl
        name: test
        command: ["bash", "/mnt/checkScript.sh"]
        volumeMounts:
        - name: check-script
          mountPath: /mnt

After applying the above manifest, the deployment-checker Deployment was created and started monitoring Deployment resources in the default Namespace:

$ kubectl apply -f all-in-one.yaml
serviceaccount/deployment-checker created
clusterrolebinding.rbac.authorization.k8s.io/deployment-checker-binding created
deployment.apps/deployment-checker created

$ kubectl get deploy,pod | grep "deployment-checker"
deployment.apps/deployment-checker   1/1     1            
pod/deployment-checker-69c8896676-pqg9h   1/1     Running

Finally, we can check how it works. I've created three Deployments (app-1, app-2, app-3):

$ kubectl create deploy app-1 --image=nginx
deployment.apps/app-1 created

$ kubectl create deploy app-2 --image=nginx
deployment.apps/app-2 created

$ kubectl create deploy app-3 --image=nginx
deployment.apps/app-3 created

Then I changed the image for the app-1 to the incorrect one (nnnginx):

$ kubectl set image deployment/app-1 nginx=nnnginx
deployment.apps/app-1 image updated

In the deployment-checker logs we can see that the app-1 has been rolled back to the previous version:

$ kubectl logs -f  deployment-checker-69c8896676-pqg9h
...
====== Thu Oct  7 09:20:15 UTC 2021 ======
Ok: app-1
Ok: app-2
Ok: app-3
====== Thu Oct  7 09:21:16 UTC 2021 ======
Error: app-1 - rolling back!
deployment.apps/app-1 rolled back
Ok: app-2
Ok: app-3