recently the same container of several pods in a deployment restarted with OOMKilled event. Here is the description of one of the containers:
State: Running
Started: Tue, 15 Feb 2022 23:33:06 0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 1
Started: Fri, 11 Feb 2022 17:48:21 0000
Finished: Tue, 15 Feb 2022 23:33:05 0000
Ready: True
Restart Count: 1
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 1
memory: 512Mi
if the container would exceed the limit of the available memory then it would exit with code 137. And I guess the container did not reached the limit. So my question what could happen if the exit code is 1 and the Reason is OOMKilled
.
Update: the process actually a python app which has threads, this is the code
ret = subprocess.run(args, stderr=subprocess.PIPE, universal_newlines=True, check=False)
if ret.returncode != 0:
logging.warning("Executing cmd failed: %s, code: %d, stderr: %s", cmd, ret.returncode, ret.stderr)
raise Exception("Failed")
and the relevant logs when called, it return with -9
:
2022-02-15T23:33:30.510Z WARNING "MainThread - Executing cmd failed: iptables-restore -n -w 3 restore-filter, code: -9, stderr: "
raise Exception("Failed")
Exception: Failed
from the description of subprocess.run(): A negative value -N indicates that the child was terminated by signal N (POSIX only).
So because the exception is raised the python code exited with 1? Probably..
CodePudding user response:
Two possible reasons:
Reason #1
Subprocess was killed by OOM killer (it received SIGKILL(9) from OOM killer), resulting in application crashing with exit code 1, and OOMKilled reason for termination.
Reason #2
If you have initContainers
specified, init container could have been killed by OOM killer, resulting in OOMKilled reason, and application crashing with exit code 1 due to the bad initialization.
OOM kill is not very well documented in Kubernetes docs. For example
Containers are marked as OOM killed only when the init pid gets killed by the kernel OOM killer. There are apps that can tolerate OOM kills of non init processes and so we chose to not track non-init process OOM kills. [source]
I could not find any mentions of it anywhere, othen than this GitHub issue.
First reason is more probable in my opinion.
Possible solution is to increase memory limits (if you have any).