I'm trying to troubleshoot an issue I'm having in kubernetes where after a job fails, the associated pod seemingly disappears and I can't view the logs. The job still exists though.
But that's not what my question is about. In reading through the documentation, it seemingly uses the terms "terminated" and "deleted" interchangably. This is leading me to be very confused. I would assume that terminated pods are not necessarily deleted, but the way the documentation is written, it implies that a terminated pod and a deleted pod are the same thing.
Example 1: https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
When a Job completes, no more Pods are created, but the Pods are usually not deleted either
"usually" then links to https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-backoff-failure-policy which then describes the logic by which pods will be terminated. So here, a link to a section which purports to describe the logic by which pods will be deleted, instead describes the logic by which pods will be terminated, implying they are one and the same.
Example 2: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced
This section is titled "Forced Pod termination" and proceeds to explain what happens when you attempt to force delete a pod. Again implying that terminating and deleting a pod are one-and-the-same.
Example 3: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination
This section, titled "Termination of Pods", describes what happens when the user requests deletion of a pod.
The job in question is failing due to DeadlineExceeded.
The documentation states "Once a Job reaches activeDeadlineSeconds
, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded
." If terminated and deleted mean the same thing, then that would explain why my pods are gone. I find that a strange design choice but it at least would explain my problem.
The kubernetes documentation asked me if the documentation was helpful, I said "no" and it told me to create a question on stack overflow, so that's what I'm doing :)
CodePudding user response:
As @karthikeayan said, Delete and Terminate are the same. And yes, your pods got deleted because activeDeadlineSeconds exceeded.
if your job have some error but you restartPolicy is not Never, so your job created pods will be deleted
restartPolicy can be set: This indicates that kubernetes will restart until the Job is completed successfully on OnFailure. However, the number of failures does not rise with each retry. To prevent a loop from failing, you can set activeDeadlineSeconds to a value.
As you have researched and gathered pretty good information it is good enough, To find the logs of a deleted pod follow this stack link or else the best way is to have your logs centralized via logging agents or directly pushing these logs into an external service as suggested by @Jonas