K8s Job being constantly recreated-CodePudding

I have a cronjob that keeps restarting, despite its RestartPolicy set to Never:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: cron-zombie-pod-killer
spec:
  schedule: "*/9 * * * *"
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        metadata:
          name: cron-zombie-pod-killer
        spec:
          containers:
            - name: cron-zombie-pod-killer
              image: bitnami/kubectl
              command:
                - "/bin/sh"
              args:
                - "-c"
                - "kubectl get pods --all-namespaces --field-selector=status.phase=Failed | awk '{print $2 \" --namespace=\" $1}' | xargs kubectl delete pod > /dev/null"
          serviceAccountName: pod-read-and-delete
          restartPolicy: Never

I would expect it to run every 9th minute, but that's not the case. What happens is that when there are pods to clean up (so, when there's smth to do for the pod) it would run normally. Once everything is cleared up, it keeps restarting -> failing -> starting, etc. in a loop every second.

Is there something I need to do to tell k8s that the job has been successful, even if there's nothing to do (no pods to clean up)? What makes the job loop in restarts and failures?

CodePudding user response：

...Once everything is cleared up, it keeps restarting -> failing -> starting, etc. in a loop every second.

When your first command returns no pod, the trailing commands (eg. awk, xargs) fails and returns non-zero exit code. Such exit code is perceived by the controller that the job has failed and therefore start a new pod to re-run the job. You should just exit with zero when there is no pod returned.

CodePudding user response：

That is by design. restartPolicy is not applied to a CronJob, but a Pod it creates.

If restartPolicy is set to Never, it will ust create new pods, if the previous failed. Setting it to OnFailure causes the Pod to be restarted, and prevents the stream of new Pods.

This was discussed in this GitHub issue: Job being constanly recreated despite RestartPolicy: Never #20255

Your kubectl command results in exit code 123 (any invocation exited with a non-zero status) if there are no Pods in Failed state. This causes the Job to fail, and constant restarts.

You can fix that by forcing kubectl command to exit with exit code 0. Add || exit 0 to the end of it:

kubectl get pods --all-namespaces --field-selector=status.phase=Failed | awk '{print $2 \" --namespace=\" $1}' | xargs kubectl delete pod > /dev/null || exit 0