What happens when kubernetes liveness-probe return false?-CodePudding

What happens when Kubernetes liveness-probe return false? Does Kubernetes restart that pod immediately?

CodePudding user response：

First, please note that livenessProbe concerns containers in the pod, not the pod itself. So if you have multiple containers in one pod, only the affected container will be restarted.

It's worth noting, that there is parameter failureThreshold, which is set by default to 3. So, after 3 failed probes a container will be restarted:

failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.

Ok, we have information that a container is restarted after 3 failed probes - but what does it mean to restart?

I found a good article about how Kubernetes terminates a pods - Kubernetes best practices: terminating with grace. Seems for container restart caused by liveness probe it's similar - I will share my experience below.

So basically when container is being terminated by liveness probe steps are:

if there is a PreStop hook, it will be executed
SIGTERM signal is sent to the container
Kubernetes waits for grace period
After grace period, SIGKILL signal is sent to a pod

So... if an app in your container is catching the SIGTERM signal properly, then the container will shut-down and will be started again. Typically it's happening pretty fast (as I tested for the NGINX image) - almost immediately.

Situation is different when SIGTERM is not supported by your application. It means after terminationGracePeriodSeconds period the SIGKILL signal is sent, it means the container will be forcibly removed.

Example below (modified example from this doc) I set failureThreshold: 1

I have following pod definition:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    image: nginx
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      periodSeconds: 10
      failureThreshold: 1

Of course there is no /tmp/healthy file, so livenessProbe will fail. The NGINX image is properly catching the SIGTERM signal, so the container will be restarted almost immediately (for every failed probe). Let's check it:

user@shell:~/liveness-test-short $ kubectl get pods
NAME                                   READY   STATUS             RESTARTS   AGE
liveness-exec                          0/1     CrashLoopBackOff   3          36s

So after ~30 sec the container is already restarted a few times and it's status is CrashLoopBackOff as expected. I created the same pod without livenessProbe and I measured the time need to shutdown it:

user@shell:~/liveness-test-short $ time kubectl delete pod liveness-exec
pod "liveness-exec" deleted

real    0m1.474s

So it's pretty fast.

The similar example but I added sleep 3000 command:

...
image: nginx
    args:
    - /bin/sh
    - -c
    - sleep 3000
...

Let's apply it and check...

user@shell:~/liveness-test-short $ kubectl get pods
NAME                                   READY   STATUS    RESTARTS   AGE
liveness-exec                          1/1     Running   5          3m37s

So after ~4 min there are only 5 restarts. Why? Because we need to wait for full terminationGracePeriodSeconds period (default is 30 seconds) for every restart. Let's measure time needed to shutdown:

user@shell:~/liveness-test-short $ time kubectl delete pod liveness-exec
pod "liveness-exec" deleted

real    0m42.418s

It's much longer.

To sum up:

What happens when Kubernetes liveness-probe return false? Does Kubernetes restart that pod immediately?

The short answer is: by default no. Why?

Kubernetes will restart a container in a pod after failureThreshold times. By default it is 3 times - so after 3 failed probes.
Depends on your configuration of the container, time needed for container termination could be very differential
You can adjust both failureThreshold and terminationGracePeriodSeconds period parameters, so the container will be restarted immediately after every failed probe.