Pod restart in OpenShift after deployment-CodePudding

Few pods in my openshift cluster are still restarted multiple times after deployment.

with describe output:
Last State: Terminated Reason: OOMKilled
Exit Code: 137

Also, memory usage is well below the memory limits. Any other parameter which I am missing to check?

There are no issues with the cluster in terms of resources.

CodePudding user response：

„OOMKilled“ means your container memory limit was reached and the container was therefore restarted.

Especially Java-based applications can consume a large amount of memory when starting up. After the startup, the memory usage often drops considerably.

So in your case, increase the ‚requests.limit.memory‘ to avoid these OOMKills. Note that the ‚requests‘ can still be lower and should roughly reflect what your container consumes after the startup.

CodePudding user response：

Basically status OOM means the container memory limit has been crossed (Out of Memory).

If the memory allocated by all of the processes in a container exceeds the memory limit, the node Out of Memory (OOM) killer will immediately select and kill a process in the container [1].

If the container does not exit immediately, an OOM kill is detectable as follows:

A container process exited with code 137, indicating it received a SIGKILL signal

The oom_kill counter in /sys/fs/cgroup/memory/memory.oom_control is incremented

If one or more processes in a pod are OOM killed, when the pod subsequently exits, whether immediately or not, it will have phase Failed and reason OOMKilled. An OOM killed pod may be restarted depending on the value of restartPolicy [2].

To check the status:

oc get pod <pod name> -o yaml

There are applications that consumes huge amounts of memory only during the start.

In this article one can find two solutions to handle the OOMKilled issues

You’ll need to size the container workload for different node configurations when using memory limits. Unfortunately there is no formula that can be applied to calculate the rate of increase in container memory usage with increasing number of cpus on the node.

One of the kernel tuneables that can help reduce the memory usage of containers is slub_max_order. A value of 0 (default is 3) can help bring down the overall memory usage of the container but can have negative performance implication for certain workloads. It’s advisable to benchmark the container workload with this tuneable. [3]

References: