Kubernetes Java pod - handling unexpected Error (Exit Code: 137)-CodePudding

I have Java pod that after few days is restarted.

Looking at kubectl describe pod ... there is just the following:

Last State:     Terminated
  Reason:       Error
  Exit Code:    137

This message, in my experience, usually means that I have an OutOfMemoryError somewhere but looking at the log I don't see anything useful.

Is there a way to execute a script (or save few files) just before the inevitable restart? Something that could help me to identify the problem.

For example: in case the restart was caused by an OutOfMemoryError, would be wonderful if I could save the memory dump or the garbage collection logs.

CodePudding user response：

There is some solutions to do that:

if you are developing the application, and you want just to debug the problem, you can stop the auto-restart by setting spec.restartPolicy: Never, in this case, K8S will not create a new pod, and your log will stay available
you can mount a volume to your application, and configure log4j to write the log to a file in the volume, so the log will be persistent
the best solution is using a log collector (fluentd, logstash) to save the log in Elastic Search or a S3 file, or using a managed service like AWS cloudWatch, datadog, ...

To solve the problem of OOM, you can add a big request memory to your application (2-4G), then you can watch the memory usage by using top command or a tool to monitor your cluster (ex prometheus):

apiVersion: v1
kind: Pod
metadata:
  name: ...
spec:
  containers:
  - name: ...
    image: ...
    resources:
      requests:
        memory: "2G"

CodePudding user response：

I found the below two ways to investigate the out of memory error in the Kubernetes. Either it would be best if you had a logging solution that will keep the logs or you can use --previous run to read the logs, which I generally use for debugging until it is the same pod that is in crashloop.

Write thread to stdout of the pod

You can take advantage of the lifecycle hook, and take a thread dump and write to stdout, so you will be able to see at k logs -f pod_name -c container_name --previous

          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "jcmd 1 Thread.print > /proc/1/fd/1"]

pod-lifecycle

This will also help if you write dumb logs to Datadog or Elasticsearch.

Writing to volume

you will need to update the java command or env and deployment chart.

      serviceAccountName: {{ include "helm-chart.fullname" . }}
      volumes:
        - name: heap-dumps
          emptyDir: {}
      containers:
        - name: java-container
          volumeMounts:
          - name: heap-dumps
            mountPath: /dumps

add this env

ENV JAVA_OPTS="-XX: CrashOnOutOfMemoryError  -XX: HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps/oom.bin -Djava.io.tmpdir=/tmp"

you will be able to see what's going on in the JVM.

Some more config regarding JVM in the container that can help you to utlizie the advance option of the JVM running inside container.

-XX:InitialRAMPercentage=50.0 -XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=85.0

he JVM has been modified to be aware that it is running in a Docker container and will extract container specific configuration information instead of querying the operating system.

jvm-in-a-container

best-practices-java-memory-arguments-for-containers