How does the failureThreshold work in liveness & readiness probes? Does it have to be consecutive fa-CodePudding

I'm unable to find any references other than this link that confirms that the failure has to be consecutive. https://github.com/kubernetes/website/issues/37414

Background: Our Java application is getting restarted every day because of liveness probe failure. The application's access logs don't show 3 consecutive failures. So wanted to understand the behavior of probes.

CodePudding user response：

Liveness check is created when Kubernetes creates pod and is recreated each time that Pod is restarted. In your configuration you have set initialDelaySeconds: 20 so after creating a pod, Kubernetes will wait 20 seconds, then it will call liveness probe 3 times (as default value failureThreshold: 3). After 3 fails, Kubernetes will restart this pod according to RestartPolicy. Also in logs you will be able to find in logs.

When you are using kubectl get events you are getting events only from the last hour.

Kubectl get events

LAST SEEN           TYPE      REASON                    OBJECT             
47m             Normal    Starting                  node/kubeadm      
43m                 Normal    Scheduled                 pod/liveness-http   
43m             Normal    Pulling                   pod/liveness-http   
43m                 Normal    Pulled                    pod/liveness-http 
43m                 Normal    Created                   pod/liveness-http   
43m             Normal    Started                   pod/liveness-http   
4m41s           Warning   Unhealthy                 pod/liveness-http 
40m                 Warning   Unhealthy                 pod/liveness-http   
12m20s              Warning   BackOff                   pod/liveness-http

same command after ~1 hour:

LAST SEEN       TYPE         REASON         OBJECT            
43s             Normal      Pulling          pod/liveness-http   
8m40s               Warning     Unhealthy       pod/liveness-http  
20m                 Warning     BackOff         pod/liveness-http

So that might be the reason you are seeing only one failure.

Liveness probe can be configured using the fields below:

initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1.
failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of a readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.

If you set the minimal values for periodSeconds, timeoutSeconds, successThreshold and failureThreshold you can expect more frequent checks and faster restarts.

Liveness probe :

Kubernetes will restart a container in a pod after failureThreshold times. By default it is 3 times - so after 3 failed probes.
Depending on your configuration of the container, time needed for container termination could be very differential
You can adjust both failureThreshold and terminationGracePeriodSeconds period parameters, so the container will be restarted immediately after every failed probe

In liveness probe configuration and best practices you can find more information.

CodePudding user response：

Yes the probes have to be consecutive, according to the api docs:

Minimum consecutive failures for the probe to be considered failed after having succeeded. Defaults to 3. Minimum value is 1.