I have an application deployed to kubernetes. Here is techstack: Java 11, Spring Boot 2.3.x or 2.5.x, using hikari 3.x or 4.x
Using spring actuator to do healthcheck. Here is liveness
and readiness
configuration within application.yaml:
endpoint:
health:
group:
liveness:
include: '*'
exclude:
- db
- readinessState
readiness:
include: '*'
what it does if DB is down -
- Makes sure
liveness
doesn't get impacted - meaning, application container should keep on running even if there is DB outage. readinesss
will be impacted making sure no traffic is allowed to hit the container.
liveness
and readiness
configuration in container spec:
livenessProbe:
httpGet:
path: actuator/health/liveness
port: 8443
scheme: HTTPS
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
readinessProbe:
httpGet:
path: actuator/health/readiness
port: 8443
scheme: HTTPS
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 20
My application is started and running fine for few hours.
What I did:
I brought down DB.
Issue Noticed:
When DB is down, after 90 seconds I see 3 more pods getting spinned up. When a pod is described I see Status and condition like below:
Status: Running
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
when I list all running pods:
NAME READY STATUS RESTARTS AGE
application-a-dev-deployment-success-5d86b4bcf4-7lsqx 0/1 Running 0 6h48m
application-a-dev-deployment-success-5d86b4bcf4-cmwd7 0/1 Running 0 49m
application-a-dev-deployment-success-5d86b4bcf4-flf7r 0/1 Running 0 48m
application-a-dev-deployment-success-5d86b4bcf4-m5nk7 0/1 Running 0 6h48m
application-a-dev-deployment-success-5d86b4bcf4-tx4rl 0/1 Running 0 49m
My Analogy/Finding:
Per ReadinessProbe
configuration: periodSeconds
is set to 30 seconds and failurethreshold
is defaulted to 3 per k8s documentation.
Per application.yaml readiness
includes db check, meaning after every 30 seconds readiness
check failed. When it fails 3 times, failurethreshold
is met and it spins up new pods.
Restart policy is default to Always.
Questions:
- Why it spinned new pods?
- Why it spinned specifically only 3 pods and not 1 or 2 or 4 or any number?
- Does this has to do anything with
restartpolicy
?
CodePudding user response:
- As you answered to yourself, it spinned new pods after 3 times tries according to
failureThreshold
. You can change yourrestartPolicy
toOnFailure
, it will allow you to restart the job only if it fails orNever
if you don't want have the cluster to be restarted. The difference between the statuses you can find here. Note this:
The restartPolicy applies to all containers in the Pod. restartPolicy only refers to restarts of the containers by the kubelet on the same node. After containers in a Pod exit, the kubelet restarts them with an exponential back-off delay (10s, 20s, 40s, …), that is capped at five minutes. Once a container has executed for 10 minutes without any problems, the kubelet resets the restart backoff timer for that container.
Share your full
Deployment
file, I suppose that you've setreplicas
number to3
.Answered in the answer for the 1st question.
Also note this, if this works for you:
Startup probes are useful for Pods that have containers that take a long time to come into service. Rather than set a long liveness interval, you can configure a separate configuration for probing the container as it starts up, allowing a time longer than the liveness interval would allow.
If your container usually starts in more than initialDelaySeconds failureThreshold × periodSeconds, you should specify a startup probe that checks the same endpoint as the liveness probe. The default for periodSeconds is 10s. You should then set its failureThreshold high enough to allow the container to start, without changing the default values of the liveness probe. This helps to protect against deadlocks.
CodePudding user response:
The crux lied in HPA. CPU utilization of POD after readiness failure used to jump up and as it was going above 70% HPA was getting triggered and started those 3 pods.