I'm new to EKS, Helm, and tiller. I am looking into why our build is breaking for our cluster deployment. I'm getting the error Error: could not find a ready tiller pod
when running helm upgrade. I see a lot of threads with this problem, but I want to have a better understanding of what's happening before I put in a PR.
There was a thread to add --upgrade
and --wait
flags to the helm init command. That didn't seem to do the trick. It looks like we also are using the --client-only
flag which is making me think it is the culprit. However, that flag was added 2 years ago so I don't understand why it would be breaking now.
helm init --upgrade --wait --stable-repo-url=https://charts.helm.sh/stable --client-only --kubeconfig ./cluster_config
echo "Helm repo update"
helm repo update --kubeconfig ./cluster_config
echo "helm upgrade"
helm upgrade --install aws-efs-csi-driver https://github.com/kubernetes-sigs/aws-efs-csi-driver/releases/download/v0.3.0/helm-chart.tgz --force --kubeconfig ./cluster_config
Update: added helm version --tls
13:27:00 [docker] [deploy-terraform] [terraform apply] null_resource.setup_cluster (local-exec): Helm Version
apply] null_resource.setup_cluster (local-exec): Client: &version.Version{SemVer:"v2.16.9", GitCommit:"8ad7037828e5a0fca1009dabe290130da6368e39", GitTreeState:"clean"}
13:27:05 [docker] [deploy-terraform] [terraform apply] null_resource.setup_cluster (local-exec): Error: Get "http://localhost:8080/api/v1/namespaces/kube-system/pods?labelSelector=app=helm,name=tiller": dial tcp 127.0.0.1:8080: connect: connection refused
Solution Upgrade to Helm v3 :-/
CodePudding user response:
If you specify the --client-only
flag, the tiller
server is never started in the cluster. Hence your problem. You can remove the flag and it should work.
See the docs for more details.
Update: Based on the output of helm version --tls
and the discussion in comments, we can see that the tiller service is not working and pods are stuck in IMAGE_PULL_BACKOFF
. Fixing that will fix this issue.
The tiller pods can no longer pull images since the upstream registry has been decommissioned. The image name can be changed to omio/gcr.io.kubernetes-helm.tiller:v2.16.1
. This is a working registry for helm v2
.