@DanLebrero.

software, simply

How to do a Java/JVM heap dump in Kubernetes

A simple task that is a little bit of a headache on Kubernetes

Image attribution: Brown Station Road Sanitary Landfill (CC BY 2.0) by Steve Snodgrass

Kubernetes is so awesome that one of our JVM containers has been periodically running out of memory for more than a year, and we just recently realized about it.

Once we realized about the issue, we obviously wanted to find out what was going on, but we could not replicate it locally.

Also, the issue happened so sporadically, that we could not just jump to the ill container and perform a jmap. Before we had any time, Kubernetes had already killed and restarted the container, which also meant that it had wiped out any heap dump that the JVM could have done with the -XX:+HeapDumpOnOutOfMemoryError flag.

After a lot of head scratching, we found that the solution was quite simple, but not obvious, if you are just starting with Kubernetes:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: your-app
spec:
  replicas: 1
  template:
    metadata:
      labels:
        test: heapdump
    spec:
      containers:
      - name: a-jvm-container
        image: openjdk:11.0.1-jdk-slim-sid
        command: ["java", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/dumps/oom.bin", "-jar", "yourapp.jar"]
        volumeMounts:
        - name: heap-dumps
          mountPath: /dumps
      volumes:
      - name: heap-dumps
        emptyDir: {}

So what we are doing is adding an empty dir volume to the container, and configuring the JVM to do the heap dumps to that directory.

The first part of the puzzle is that, when Kubernetes is killing your container because it is not responding to the health check, Kubernetes will just restart the container, but will not reschedule the pod, so it will not move it to another node.

The other part of the puzzle is that an empty dir volume is not deleted until the pod is moved to another node.

Putting both things together means that, after the container is restarted, the new container will mount the same empty dir, which will contain the heap dump from the previous run. So you can kubectl cp those files at any time after the event.

OOM on startup

If the OutOfMemory error is happening during start up, you probably are not going to be able to copy the dump before the container is restarted.

In this case, the little trick is to add a very simple and tiny sidecar to your pod, and mount in that sidecar the same empty dir, so you can access the heap dumps through the sidecar container, instead of the main container.

Remember in this case to set the -XX:HeapDumpPath option to generate an unique file name.

Shipping the heap dump out of Kubernetes

If you don’t want to, or cannot, access the Kubernetes pod directly, you can always ship the heap dumps out of Kubernetes.

In our case, credits to Ivan Perdomo for the work, we did it by adding a sidecar that will also mount that empty dir volume, and it will listen with inotify to changes in that directory. On closing of the heap dump file, it will start the process of copying the file to a Google Storage bucket:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: your-app
spec:
  replicas: 1
  template:
    metadata:
      labels:
        test: heapdump
    spec:
      containers:
      - name: a-jvm-container
        image: openjdk:11.0.1-jdk-slim-sid
        command: ["java", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/dumps/oom.bin", "-jar", "yourapp.jar"]
        volumeMounts:
        - name: heap-dumps
          mountPath: /dumps
      - name: ship-heap-dump
        image: google/cloud-sdk:206.0.0-alpine
        command: ["/bin/sh", "-c"]
        args:
        - |
          apk add --no-cache inotify-tools &&
          gcloud auth activate-service-account --key-file=/secrets/jvm-debug.json &&
          inotifywait -m /dumps -e close_write | while read path action file; do gsutil cp "$path$file" "gs://heap-dump/$file"; done;
        volumeMounts:
        - name: heap-dumps
          mountPath: /dumps
      volumes:
      - name: heap-dumps
        emptyDir: {}

As it trip us in our happy Kubernetes journey, maybe it will makes yours easier.


Did you enjoyed it? or share!

Tagged in : Clojure Java Kubernetes