helm upgrade: Error: could not find a ready tiller pod-CodePudding

I'm new to EKS, Helm, and tiller. I am looking into why our build is breaking for our cluster deployment. I'm getting the error Error: could not find a ready tiller pod when running helm upgrade. I see a lot of threads with this problem, but I want to have a better understanding of what's happening before I put in a PR.

There was a thread to add --upgrade and --wait flags to the helm init command. That didn't seem to do the trick. It looks like we also are using the --client-only flag which is making me think it is the culprit. However, that flag was added 2 years ago so I don't understand why it would be breaking now.

helm init --upgrade --wait --stable-repo-url=https://charts.helm.sh/stable --client-only --kubeconfig ./cluster_config

echo "Helm repo update"
helm repo update --kubeconfig ./cluster_config

echo "helm upgrade"
helm upgrade --install aws-efs-csi-driver https://github.com/kubernetes-sigs/aws-efs-csi-driver/releases/download/v0.3.0/helm-chart.tgz --force --kubeconfig ./cluster_config

Update: added helm version --tls

13:27:00  [docker] [deploy-terraform] [terraform apply] null_resource.setup_cluster (local-exec): Helm Version
apply] null_resource.setup_cluster (local-exec): Client: &version.Version{SemVer:"v2.16.9", GitCommit:"8ad7037828e5a0fca1009dabe290130da6368e39", GitTreeState:"clean"}
13:27:05  [docker] [deploy-terraform] [terraform apply] null_resource.setup_cluster (local-exec): Error: Get "http://localhost:8080/api/v1/namespaces/kube-system/pods?labelSelector=app=helm,name=tiller": dial tcp 127.0.0.1:8080: connect: connection refused

Solution Upgrade to Helm v3 :-/

CodePudding user response：

If you specify the --client-only flag, the tiller server is never started in the cluster. Hence your problem. You can remove the flag and it should work.

See the docs for more details.

Update: Based on the output of helm version --tls and the discussion in comments, we can see that the tiller service is not working and pods are stuck in IMAGE_PULL_BACKOFF. Fixing that will fix this issue.

The tiller pods can no longer pull images since the upstream registry has been decommissioned. The image name can be changed to omio/gcr.io.kubernetes-helm.tiller:v2.16.1. This is a working registry for helm v2.