Home > OS >  Controlling pod recovery from "Error: ImagePullBackOff" when Contrainer Registry is also i
Controlling pod recovery from "Error: ImagePullBackOff" when Contrainer Registry is also i

Time:02-27

We had a major outage when both our container registry and the entire K8S cluster lost power. When the cluster recovered faster than the container registry, my pod (part of a statefulset) is stuck in "Error: ImagePullBackOff".

Is there a config setting to retry downloading the image from the CR periodically or recover without manual intervention?

I looked at imagePullPolicy but that does not apply for a situation when the CR is unavailable.

CodePudding user response:

The BackOff part in ImagePullBackOff status means that Kubernetes is keep trying to pull the image from the registry, with an exponential back-off delay (10s, 20s, 40s, …). The delay between each attempt is increased until it reaches a compiled-in limit of 300 seconds (5 minutes) - more on it in Kubernetes docs.

backOffPeriod parameter for the image pulls is a hard-coded constant in Kuberenets and unfortunately is not tunable now, as it can affect the node performance - otherwise, it can be adjusted in the very code for your custom kubelet binary. There is still ongoing issue on making it adjustable.

  • Related