I am trying to deploy AIrflow on EKS Fargate using Helm. I have the EKS cluster, SC, PV, and PVC, along with namespace and fargate-profile(dev) all set up.
My problem comes when I do helm install:
helm upgrade --install airflow apache-airflow/airflow -n dev --values values.yaml --set volumePermissions.enbled=true --debug
[![list of pods][1]][1]
Above is the list of pods. The last 3 keep going into Crashloopbackoff.
Here is the describe of webserver pod:
C:\Users\tanma>kubectl describe pods -n dev airflow-webserver-775d548b98-wd5x8
Name: airflow-webserver-775d548b98-wd5x8
Namespace: dev
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: airflow-webserver
Node: fargate-ip-192-168-161-147.us-west-2.compute.internal/192.168.161.147
Start Time: Thu, 13 Oct 2022 17:12:54 -0400
Labels: component=webserver
eks.amazonaws.com/fargate-profile=dev
pod-template-hash=775d548b98
release=airflow
tier=airflow
Annotations: CapacityProvisioned: 0.25vCPU 0.5GB
Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
checksum/airflow-config: 978d20ff42d3de620bee24f2e35b1769f20ebd948890bf474bd940624e39f150
checksum/extra-configmaps: 2e44e493035e2f6a255d08f8104087ff10d30aef6f63176f1b18f75f73295598
checksum/extra-secrets: bb91ef06ddc31c0c5a29973832163d8b0b597812a793ef911d33b622bc9d1655
checksum/metadata-secret: d9bd679df96f2631a8559d02cc528fd78c3d73c06289be9816d83fb332e05b5e
checksum/pgbouncer-config-secret: da52bd1edfe820f0ddfacdebb20a4cc6407d296ee45bcb500a6407e2261a5ba2
checksum/webserver-config: 4a2281a4e3ed0cc5e89f07aba3c1bb314ea51c17cb5d2b41e9b045054a6b5c72
checksum/webserver-secret-key: a1e18ebcc73a51b6bafe52d95eee84dcdf132559cac0248fff6e58e409b4505e
kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.161.147
IPs:
IP: 192.168.161.147
Controlled By: ReplicaSet/airflow-webserver-775d548b98
Init Containers:
wait-for-airflow-migrations:
Container ID: containerd://bf4919f7a268bbeaf1a8f8779e4da1551d76f622d9ce970f18a3f2a1f14c24d7
Image: apache/airflow:2.4.1
Image ID: docker.io/apache/airflow@sha256:e077b68d81d56d773bddbcdc8941b7a2c16a2087a641005dfc5f1b8dcadec90a
Port: <none>
Host Port: <none>
Args:
airflow
db
check-migrations
--migration-wait-timeout=60
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 13 Oct 2022 17:14:40 -0400
Finished: Thu, 13 Oct 2022 17:15:12 -0400
Ready: True
Restart Count: 0
Environment:
AIRFLOW__CORE__FERNET_KEY: <set to the key 'fernet-key' in secret 'airflow-fernet-key'> Optional: false
AIRFLOW__CORE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW_CONN_AIRFLOW_DB: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW__WEBSERVER__SECRET_KEY: <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'> Optional: false
Mounts:
/opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pntv6 (ro)
Containers:
webserver:
Container ID: containerd://e479b50af8eefc8c99971cc9cc9b6345f826c09d5f770276b33518340298359d
Image: apache/airflow:2.4.1
Image ID: docker.io/apache/airflow@sha256:e077b68d81d56d773bddbcdc8941b7a2c16a2087a641005dfc5f1b8dcadec90a
Port: 8080/TCP
Host Port: 0/TCP
Args:
bash
-c
exec airflow webserver
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Thu, 13 Oct 2022 17:40:25 -0400
Finished: Thu, 13 Oct 2022 17:42:19 -0400
Ready: False
Restart Count: 9
Liveness: http-get http://:8080/health delay=15s timeout=30s period=5s #success=1 #failure=20
Readiness: http-get http://:8080/health delay=15s timeout=30s period=5s #success=1 #failure=20
Environment:
AIRFLOW__CORE__FERNET_KEY: <set to the key 'fernet-key' in secret 'airflow-fernet-key'> Optional: false
AIRFLOW__CORE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW_CONN_AIRFLOW_DB: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW__WEBSERVER__SECRET_KEY: <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'> Optional: false
Mounts:
/opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
/opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
/opt/airflow/logs from logs (rw)
/opt/airflow/pod_templates/pod_template_file.yaml from config (ro,path="pod_template_file.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pntv6 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: airflow-airflow-config
Optional: false
logs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: af-efs-fargate-1
ReadOnly: false
kube-api-access-pntv6:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled 31m fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled 30m fargate-scheduler Successfully assigned dev/airflow-webserver-775d548b98-wd5x8 to fargate-ip-192-168-161-147.us-west-2.compute.internal
Normal Pulling 30m kubelet Pulling image "apache/airflow:2.4.1"
Normal Pulled 28m kubelet Successfully pulled image "apache/airflow:2.4.1" in 1m43.155801441s
Normal Created 28m kubelet Created container wait-for-airflow-migrations
Normal Started 28m kubelet Started container wait-for-airflow-migrations
Normal Pulled 28m kubelet Container image "apache/airflow:2.4.1" already present on machine
Normal Created 28m kubelet Created container webserver
Normal Started 28m kubelet Started container webserver
Warning Unhealthy 27m (x9 over 27m) kubelet Readiness probe failed: Get "http://192.168.161.147:8080/health": dial tcp 192.168.161.147:8080: connect: connection refused
Warning Unhealthy 10m (x156 over 27m) kubelet Liveness probe failed: Get "http://192.168.161.147:8080/health": dial tcp 192.168.161.147:8080: connect: connection refused
Warning BackOff 10s (x44 over 14m) kubelet Back-off restarting failed container
Any thoughts on why the pods keep restarting?
Appreciate your help here.
Thanks
[1]: https://i.stack.imgur.com/IPocP.png
CodePudding user response:
Your host port is 0. I guess that could cause the webserver not to be able to expose its port. However, you'd have to check the logs of the webserver pod itself to make sure this is the problem.
You need to make sure that this endpoint is available (which is not currently); http://192.168.161.147:8080/health
CodePudding user response:
Ended up increasing the resources for webserver and this solved the problem.
THanks