I am using official Helm chart for airflow. Every Pod works properly except Worker node.
Even in that worker node, 2 of the containers (git-sync and worker-log-groomer) works fine.
The error happened in the 3rd container (worker) with CrashLoopBackOff. Exit code status as 137 OOMkilled.
In my openshift, memory usage is showing to be at 70%.
Although this error comes because of memory leak. This doesn't happen to be the case for this one. Please help, I have been going on in this one for a week now.
Kubectl describe pod airflow-worker-0 ->
worker:
Container ID: <>
Image: <>
Image ID: <>
Port: <>
Host Port: <>
Args:
bash
-c
exec \
airflow celery worker
State: Running
Started: <>
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: <>
Finished: <>
Ready: True
Restart Count: 3
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
DUMB_INIT_SETSID: 0
AIRFLOW__CORE__FERNET_KEY: <> Optional: false
Mounts:
<>
git-sync:
Container ID: <>
Image: <>
Image ID: <>
Port: <none>
Host Port: <none>
State: Running
Started: <>
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
GIT_SYNC_REV: HEAD
Mounts:
<>
worker-log-groomer:
Container ID: <>
Image: <>
Image ID: <>
Port: <none>
Host Port: <none>
Args:
bash
/clean-logs
State: Running
Started: <>
Ready: True
Restart Count: 0
Limits:
ephemeral-storage: 30G
memory: 1Gi
Requests:
cpu: 50m
ephemeral-storage: 100M
memory: 409Mi
Environment:
AIRFLOW__LOG_RETENTION_DAYS: 5
Mounts:
<>
I am pretty much sure you know the answer. Read all your articles on airflow. Thank you :) https://stackoverflow.com/users/1376561/marc-lamberti
CodePudding user response:
The issues occurs due to placing a limit in "resources" under helm chart - values.yaml in any of the pods.
By default it is -
resources: {}
but this causes an issue as pods can access unlimited memory as required.
By changing it to -
resources:
limits:
cpu: 200m
memory: 2Gi
requests:
cpu: 100m
memory: 512Mi
It makes the pod clear on how much it can access and request. This solved my issue.