Home > front end >  Kubernetes: change backoffLimit default value
Kubernetes: change backoffLimit default value

Time:10-22

Is it possible to configure backoffLimit globally (for example, change default limit from 6 to 2 for all jobs in cluster not specifying backoffLimit: 2 for each job)?

CodePudding user response:

It seems that the default values, with the spec.backOffLimit being included, are hardcoded directly into Kubernetes code.

From apis/batch/v1/defaults.go

func SetDefaults_Job(obj *batchv1.Job) {
    // For a non-parallel job, you can leave both `.spec.completions` and
    // `.spec.parallelism` unset.  When both are unset, both are defaulted to 1.
    if obj.Spec.Completions == nil && obj.Spec.Parallelism == nil {
        obj.Spec.Completions = utilpointer.Int32Ptr(1)
        obj.Spec.Parallelism = utilpointer.Int32Ptr(1)
    }
    if obj.Spec.Parallelism == nil {
        obj.Spec.Parallelism = utilpointer.Int32Ptr(1)
    }
    if obj.Spec.BackoffLimit == nil {
        obj.Spec.BackoffLimit = utilpointer.Int32Ptr(6)
    }
    labels := obj.Spec.Template.Labels
    if labels != nil && len(obj.Labels) == 0 {
        obj.Labels = labels
    }
    if utilfeature.DefaultFeatureGate.Enabled(features.IndexedJob) && obj.Spec.CompletionMode == nil {
        mode := batchv1.NonIndexedCompletion
        obj.Spec.CompletionMode = &mode
    }
    if utilfeature.DefaultFeatureGate.Enabled(features.SuspendJob) && obj.Spec.Suspend == nil {
        obj.Spec.Suspend = utilpointer.BoolPtr(false)
    }
}

So I think it cannot be changed without changing the code, at the moment.

CodePudding user response:

No, it's not possible since backoffLimit is configured on Pod level as per the official documentation:

There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s ...) capped at six minutes. The back-off count is reset when a Job's Pod is deleted or successful without any other Pods for the Job failing around that time.

  • Related