Auto delete CrashBackoffLoop pods in a deployment-CodePudding

In my kubernetes cluster, there are multiple deployments in a namespace. For a specific deployment, there is a need to not allow "CrashLoopBackoff" pods to exist. So basically, when any pod gets to this state, I would want it to be deleted and later a new pod to be created which is already handled by the ReplicaSet.

I tried with custom controllers, with the thought that the SharedInformer would alert about the state of Pod and then I would delete it from that loop. However, this brings dependency on the pod on which the custom controller would run.

I also tried searching for any option to be configured in the manifest itself, but could not find any.

I am pretty new to Kuberenetes, so need help in the implementation of this behaviour.

CodePudding user response：

Firstly, you should address the reason why the pod has entered the CrashLoopBackOff state rather than just delete it. If you do this, you'll potentially just recreate the problem again and you'll be deleting pods repeatedly. For example, if your pod is trying to access an external DB and that DB is down, it'll CrashLoop, and deleting and restarting the pod won't help fix that.

Secondly, if you want to do this deleting in an automated manner, an easy way would be to run a CronJob resource that goes through your deployment and deletes the CrashLooped pods. You could set the cronjob to run once an hour or whatever schedule you wish.

CodePudding user response：

Deleting the POD and waiting for the New one is like restarting the deployment or POD.

Kubernetes will auto restart your CrashLoopBackoff POD if failing, you can check the Restart count.

NAME        READY    STATUS              RESTARTS    AGE
te-pod-1    0/1      CrashLoopBackOff    2           1m44s

This restarts will be similar to what you have mentioned

when any pod gets to this state, I would want it to be deleted and later a new pod to be created which is already handled by the ReplicaSet.

If you want to remove Crashing the POD fully and not look for new POD to come up, you have to rollback the deployment.

If there is any issue with your Replicaset and your POD is crashing it would be useless, any number of times you delete and restart the POD it will crash all time, unless you check logs & debug to solve the real issue in replicaset(Deployment).