What happens when Kubernetes liveness-probe return false? Does Kubernetes restart that pod immediately?
CodePudding user response:
First, please note that livenessProbe
concerns containers in the pod, not the pod itself. So if you have multiple containers in one pod, only the affected container will be restarted.
It's worth noting, that there is parameter failureThreshold
, which is set by default to 3. So, after 3 failed probes a container will be restarted:
failureThreshold
: When a probe fails, Kubernetes will tryfailureThreshold
times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.
Ok, we have information that a container is restarted after 3 failed probes - but what does it mean to restart?
I found a good article about how Kubernetes terminates a pods - Kubernetes best practices: terminating with grace. Seems for container restart caused by liveness probe it's similar - I will share my experience below.
So basically when container is being terminated by liveness probe steps are:
- if there is a
PreStop
hook, it will be executed - SIGTERM signal is sent to the container
- Kubernetes waits for grace period
- After grace period, SIGKILL signal is sent to a pod
So... if an app in your container is catching the SIGTERM signal properly, then the container will shut-down and will be started again. Typically it's happening pretty fast (as I tested for the NGINX image) - almost immediately.
Situation is different when SIGTERM is not supported by your application. It means after terminationGracePeriodSeconds
period the SIGKILL signal is sent, it means the container will be forcibly removed.
Example below (modified example from this doc) I set failureThreshold: 1
I have following pod definition:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: nginx
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
periodSeconds: 10
failureThreshold: 1
Of course there is no /tmp/healthy
file, so livenessProbe will fail. The NGINX image is properly catching the SIGTERM signal, so the container will be restarted almost immediately (for every failed probe). Let's check it:
user@shell:~/liveness-test-short $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 CrashLoopBackOff 3 36s
So after ~30 sec the container is already restarted a few times and it's status is CrashLoopBackOff as expected. I created the same pod without livenessProbe and I measured the time need to shutdown it:
user@shell:~/liveness-test-short $ time kubectl delete pod liveness-exec
pod "liveness-exec" deleted
real 0m1.474s
So it's pretty fast.
The similar example but I added sleep 3000
command:
...
image: nginx
args:
- /bin/sh
- -c
- sleep 3000
...
Let's apply it and check...
user@shell:~/liveness-test-short $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 5 3m37s
So after ~4 min there are only 5 restarts. Why? Because we need to wait for full terminationGracePeriodSeconds
period (default is 30 seconds) for every restart. Let's measure time needed to shutdown:
user@shell:~/liveness-test-short $ time kubectl delete pod liveness-exec
pod "liveness-exec" deleted
real 0m42.418s
It's much longer.
To sum up:
What happens when Kubernetes liveness-probe return false? Does Kubernetes restart that pod immediately?
The short answer is: by default no. Why?
- Kubernetes will restart a container in a pod after
failureThreshold
times. By default it is 3 times - so after 3 failed probes. - Depends on your configuration of the container, time needed for container termination could be very differential
- You can adjust both
failureThreshold
andterminationGracePeriodSeconds
period parameters, so the container will be restarted immediately after every failed probe.