I have Java pod that after few days is restarted.
Looking at kubectl describe pod ...
there is just the following:
Last State: Terminated
Reason: Error
Exit Code: 137
This message, in my experience, usually means that I have an OutOfMemoryError somewhere but looking at the log I don't see anything useful.
Is there a way to execute a script (or save few files) just before the inevitable restart? Something that could help me to identify the problem.
For example: in case the restart was caused by an OutOfMemoryError, would be wonderful if I could save the memory dump or the garbage collection logs.
CodePudding user response:
There is some solutions to do that:
- if you are developing the application, and you want just to debug the problem, you can stop the auto-restart by setting
spec.restartPolicy: Never
, in this case, K8S will not create a new pod, and your log will stay available - you can mount a volume to your application, and configure
log4j
to write the log to a file in the volume, so the log will be persistent - the best solution is using a log collector (fluentd, logstash) to save the log in Elastic Search or a S3 file, or using a managed service like AWS cloudWatch, datadog, ...
To solve the problem of OOM, you can add a big request memory to your application (2-4G), then you can watch the memory usage by using top
command or a tool to monitor your cluster (ex prometheus):
apiVersion: v1
kind: Pod
metadata:
name: ...
spec:
containers:
- name: ...
image: ...
resources:
requests:
memory: "2G"
CodePudding user response:
I found the below two ways to investigate the out of memory error
in the Kubernetes. Either it would be best if you had a logging solution that will keep the logs or you can use --previous
run to read the logs, which I generally use for debugging until it is the same
pod that is in crashloop
.
Write thread to stdout of the pod
You can take advantage of the lifecycle
hook, and take a thread dump
and write to stdout, so you will be able to see at k logs -f pod_name -c container_name --previous
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "jcmd 1 Thread.print > /proc/1/fd/1"]
This will also help if you write dumb logs to Datadog or Elasticsearch.
Writing to volume
you will need to update the java
command or env
and deployment chart.
serviceAccountName: {{ include "helm-chart.fullname" . }}
volumes:
- name: heap-dumps
emptyDir: {}
containers:
- name: java-container
volumeMounts:
- name: heap-dumps
mountPath: /dumps
add this env
ENV JAVA_OPTS="-XX: CrashOnOutOfMemoryError -XX: HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps/oom.bin -Djava.io.tmpdir=/tmp"
you will be able to see what's going on in the JVM.
Some more config regarding JVM in the container that can help you to utlizie the advance option of the JVM running inside container.
-XX:InitialRAMPercentage=50.0 -XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=85.0
he JVM has been modified to be aware that it is running in a Docker container and will extract container specific configuration information instead of querying the operating system.