RabbitMQ pod is crashing unexpectedly-CodePudding

I have a pod running RabbitMQ. Below is the deployment manifest:

apiVersion: v1
kind: Service
metadata:
  name: service-rabbitmq
spec:
  selector:
    app: service-rabbitmq
  ports:
    - port: 5672
      targetPort: 5672
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-rabbitmq
spec:
  selector:
    matchLabels:
      app: deployment-rabbitmq
  template:
    metadata:
      labels:
        app: deployment-rabbitmq
    spec:
      containers:
        - name: rabbitmq
          image: rabbitmq:latest
          volumeMounts:
            - name: rabbitmq-data-volume
              mountPath: /var/lib/rabbitmq
          resources:
            requests:
              cpu: 250m
              memory: 128Mi
            limits:
              cpu: 750m
              memory: 256Mi
      volumes:
        - name: rabbitmq-data-volume
          persistentVolumeClaim:
            claimName: rabbitmq-pvc

When I deploy it in my local cluster, I see the pod running for a while and then crashing afterwards. So basically it goes under crash-loop. Following is the logs I got from the pod:

$ kubectl logs deployment-rabbitmq-649b8479dc-kt9s4
2021-10-14 06:46:36.182390 00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2021-10-14 06:46:36.221717 00:00 [info] <0.222.0> Feature flags:   [ ] implicit_default_bindings
2021-10-14 06:46:36.221768 00:00 [info] <0.222.0> Feature flags:   [ ] maintenance_mode_status
2021-10-14 06:46:36.221792 00:00 [info] <0.222.0> Feature flags:   [ ] quorum_queue
2021-10-14 06:46:36.221813 00:00 [info] <0.222.0> Feature flags:   [ ] stream_queue
2021-10-14 06:46:36.221916 00:00 [info] <0.222.0> Feature flags:   [ ] user_limits
2021-10-14 06:46:36.221933 00:00 [info] <0.222.0> Feature flags:   [ ] virtual_host_metadata
2021-10-14 06:46:36.221953 00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
2021-10-14 06:46:37.018537 00:00 [noti] <0.44.0> Application syslog exited with reason: stopped
2021-10-14 06:46:37.018646 00:00 [noti] <0.222.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2021-10-14 06:46:37.045601 00:00 [noti] <0.222.0> Logging: configured log handlers are now ACTIVE
2021-10-14 06:46:37.635024 00:00 [info] <0.222.0> ra: starting system quorum_queues
2021-10-14 06:46:37.635139 00:00 [info] <0.222.0> starting Ra system: quorum_queues in directory: /var/lib/rabbitmq/mnesia/rabbit@deployment-rabbitmq-649b8479dc-kt9s4/quorum/rabbit@deployment-rabbitmq-649b8479dc-kt9s4
2021-10-14 06:46:37.849041 00:00 [info] <0.259.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2021-10-14 06:46:37.877504 00:00 [noti] <0.264.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables

This log isn't helpful too much, I can't find any error message from here. The only useful line here could be Application syslog exited with reason: stopped, only but it's not as far as I understand. The event log isn't helpful too:

$ kubectl describe pods deployment-rabbitmq-649b8479dc-kt9s4
Name:         deployment-rabbitmq-649b8479dc-kt9s4
Namespace:    default
Priority:     0
Node:         docker-desktop/192.168.65.4
Start Time:   Thu, 14 Oct 2021 12:45:03  0600
Labels:       app=deployment-rabbitmq
              pod-template-hash=649b8479dc
              skaffold.dev/run-id=7af5e1bb-e0c8-4021-a8a0-0c8bf43630b6
Annotations:  <none>
Status:       Running
IP:           10.1.5.138
IPs:
  IP:           10.1.5.138
Controlled By:  ReplicaSet/deployment-rabbitmq-649b8479dc
Containers:
  rabbitmq:
    Container ID:   docker://de309f94163c071afb38fb8743d106923b6bda27325287e82bc274e362f1f3be
    Image:          rabbitmq:latest
    Image ID:       docker-pullable://rabbitmq@sha256:d8efe7b818e66a13fdc6fdb84cf527984fb7d73f52466833a20e9ec298ed4df4
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    0
      Started:      Thu, 14 Oct 2021 13:56:29  0600
      Finished:     Thu, 14 Oct 2021 13:56:39  0600
    Ready:          False
    Restart Count:  18
    Limits:
      cpu:     750m
      memory:  256Mi
    Requests:
      cpu:        250m
      memory:     128Mi
    Environment:  <none>
    Mounts:
      /var/lib/rabbitmq from rabbitmq-data-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9shdv (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  rabbitmq-data-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rabbitmq-pvc
    ReadOnly:   false
  kube-api-access-9shdv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                    From     Message
  ----     ------   ----                   ----     -------
  Normal   Pulled   23m (x6 over 50m)      kubelet  (combined from similar events): Successfully pulled image "rabbitmq:latest" in 4.267310231s
  Normal   Pulling  18m (x16 over 73m)     kubelet  Pulling image "rabbitmq:latest"
  Warning  BackOff  3m45s (x307 over 73m)  kubelet  Back-off restarting failed container

What could be the reason for this crash-loop?

NOTE: rabbitmq-pvc is successfully bound. No issue there.

Update:

This answer indicates that RabbitMQ should be deployed as StatefulSet. So I adjusted the manifest like so:

apiVersion: v1
kind: Service
metadata:
  name: service-rabbitmq
spec:
  selector:
    app: service-rabbitmq
  ports:
    - name: rabbitmq-amqp
      port: 5672
    - name: rabbitmq-http
      port: 15672
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: statefulset-rabbitmq
spec:
  selector:
    matchLabels:
      app: statefulset-rabbitmq
  serviceName: service-rabbitmq
  template:
    metadata:
      labels:
        app: statefulset-rabbitmq
    spec:
      containers:
        - name: rabbitmq
          image: rabbitmq:latest
          volumeMounts:
            - name: rabbitmq-data-volume
              mountPath: /var/lib/rabbitmq/mnesia
          resources:
            requests:
              cpu: 250m
              memory: 128Mi
            limits:
              cpu: 750m
              memory: 256Mi
      volumes:
        - name: rabbitmq-data-volume
          persistentVolumeClaim:
            claimName: rabbitmq-pvc

The pod still undergoes crash-loop, but the logs are slightly different.

$ kubectl logs statefulset-rabbitmq-0
2021-10-14 09:38:26.138224 00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2021-10-14 09:38:26.158953 00:00 [info] <0.222.0> Feature flags:   [x] implicit_default_bindings
2021-10-14 09:38:26.159015 00:00 [info] <0.222.0> Feature flags:   [x] maintenance_mode_status
2021-10-14 09:38:26.159037 00:00 [info] <0.222.0> Feature flags:   [x] quorum_queue
2021-10-14 09:38:26.159078 00:00 [info] <0.222.0> Feature flags:   [x] stream_queue
2021-10-14 09:38:26.159183 00:00 [info] <0.222.0> Feature flags:   [x] user_limits
2021-10-14 09:38:26.159236 00:00 [info] <0.222.0> Feature flags:   [x] virtual_host_metadata
2021-10-14 09:38:26.159270 00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
2021-10-14 09:38:26.830814 00:00 [noti] <0.44.0> Application syslog exited with reason: stopped
2021-10-14 09:38:26.830925 00:00 [noti] <0.222.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2021-10-14 09:38:26.852048 00:00 [noti] <0.222.0> Logging: configured log handlers are now ACTIVE
2021-10-14 09:38:33.754355 00:00 [info] <0.222.0> ra: starting system quorum_queues
2021-10-14 09:38:33.754526 00:00 [info] <0.222.0> starting Ra system: quorum_queues in directory: /var/lib/rabbitmq/mnesia/rabbit@statefulset-rabbitmq-0/quorum/rabbit@statefulset-rabbitmq-0
2021-10-14 09:38:33.760365 00:00 [info] <0.290.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2021-10-14 09:38:33.761023 00:00 [noti] <0.302.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables

The feature flags are now marked as it's seen. No other notable changes. So I still need help.

! New Issue !

Head over here.

CodePudding user response：

The pod gets oomkilled (last state, reason) and you need to assign more resources (memory) to the pod.