Stuck in the partial helm release on Terraform to Kubernetes-CodePudding

I'm trying to apply a terraform resource (helm_release) to k8s and the apply command is failed half way through.

I checked the pod issue now I need to update some values in the local chart.

Now I'm in a dilemma, where I can't apply the helm_release as the names are in use, and I can't destroy the helm_release since it is not created.

Seems to me the only option is to manually delete the k8s resources that were created by the helm_release chart?

Here is the terraform for helm_release:

cat nginx-arm64.tf 

resource "helm_release" "nginx-ingress" {
  name  = "nginx-ingress"
  chart = "/data/terraform/k8s/nginx-ingress-controller-arm64.tgz"
}

BTW: I need to use the local chart as the official chart does not support the ARM64 architecture. Thanks,

Edit #1:

Here is the list of helm release -> there is no gninx ingress

/data/terraform/k8s$ helm list -A
NAME            NAMESPACE   REVISION    UPDATED                                 STATUS      CHART               APP VERSION
cert-manager    default     1           2021-12-08 20:57:38.979176622  0000 UTC deployed    cert-manager-v1.5.0 v1.5.0     
/data/terraform/k8s$

Here is the describe pod output:

$ k describe pod/nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr
Name:         nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr
Namespace:    default
Priority:     0
Node:         ocifreevmalways/10.0.0.189
Start Time:   Wed, 08 Dec 2021 11:11:59  0000
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=nginx-ingress
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=nginx-ingress-controller
              helm.sh/chart=nginx-ingress-controller-9.0.9
              pod-template-hash=99cddc76b
Annotations:  <none>
Status:       Running
IP:           10.244.0.22
IPs:
  IP:           10.244.0.22
Controlled By:  ReplicaSet/nginx-ingress-nginx-ingress-controller-99cddc76b
Containers:
  controller:
    Container ID:  docker://0b75f5f68ef35dfb7dc5b90f9d1c249fad692855159f4e969324fc4e2ee61654
    Image:         docker.io/rancher/nginx-ingress-controller:nginx-1.1.0-rancher1
    Image ID:      docker-pullable://rancher/nginx-ingress-controller@sha256:177fb5dc79adcd16cb6c15d6c42cef31988b116cb148845893b6b954d7d593bc
    Ports:         80/TCP, 443/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --default-backend-service=default/nginx-ingress-nginx-ingress-controller-default-backend
      --election-id=ingress-controller-leader
      --controller-class=k8s.io/ingress-nginx
      --configmap=default/nginx-ingress-nginx-ingress-controller
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 08 Dec 2021 22:02:15  0000
      Finished:     Wed, 08 Dec 2021 22:02:15  0000
    Ready:          False
    Restart Count:  132
    Liveness:       http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       nginx-ingress-nginx-ingress-controller-99cddc76b-62nsr (v1:metadata.name)
      POD_NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wzqqn (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-wzqqn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                     From     Message
  ----     ------   ----                    ----     -------
  Normal   Pulled   8m38s (x132 over 10h)   kubelet  Container image "docker.io/rancher/nginx-ingress-controller:nginx-1.1.0-rancher1" already present on machine
  Warning  BackOff  3m39s (x3201 over 10h)  kubelet  Back-off restarting failed container

The terraform state list shows nothing:

/data/terraform/k8s$ t state list
/data/terraform/k8s$

Though the terraform.tfstate.backup shows the nginx ingress (I guess that I did run the destroy command in between?):

/data/terraform/k8s$ cat terraform.tfstate.backup
{
  "version": 4,
  "terraform_version": "1.0.11",
  "serial": 28,
  "lineage": "30e74aa5-9631-f82f-61a2-7bdbd97c2276",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "helm_release",
      "name": "nginx-ingress",
      "provider": "provider[\"registry.terraform.io/hashicorp/helm\"]",
      "instances": [
        {
          "status": "tainted",
          "schema_version": 0,
          "attributes": {
            "atomic": false,
            "chart": "/data/terraform/k8s/nginx-ingress-controller-arm64.tgz",
            "cleanup_on_fail": false,
            "create_namespace": false,
            "dependency_update": false,
            "description": null,
            "devel": null,
            "disable_crd_hooks": false,
            "disable_openapi_validation": false,
            "disable_webhooks": false,
            "force_update": false,
            "id": "nginx-ingress",
            "keyring": null,
            "lint": false,
            "manifest": null,
            "max_history": 0,
            "metadata": [
              {
                "app_version": "1.1.0",
                "chart": "nginx-ingress-controller",
                "name": "nginx-ingress",
                "namespace": "default",
                "revision": 1,
                "values": "{}",
                "version": "9.0.9"
              }
            ],
            "name": "nginx-ingress",
            "namespace": "default",
            "postrender": [],
            "recreate_pods": false,
            "render_subchart_notes": true,
            "replace": false,
            "repository": null,
            "repository_ca_file": null,
            "repository_cert_file": null,
            "repository_key_file": null,
            "repository_password": null,
            "repository_username": null,
            "reset_values": false,
            "reuse_values": false,
            "set": [],
            "set_sensitive": [],
            "skip_crds": false,
            "status": "failed",
            "timeout": 300,
            "values": null,
            "verify": false,
            "version": "9.0.9",
            "wait": true,
            "wait_for_jobs": false
          },
          "sensitive_attributes": [],
          "private": "bnVsbA=="
        }
      ]
    }
  ]
}

When I try to apply in the same directory, it prompts the error again:

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

helm_release.nginx-ingress: Creating...
╷
│ Error: cannot re-use a name that is still in use
│ 
│   with helm_release.nginx-ingress,
│   on nginx-arm64.tf line 1, in resource "helm_release" "nginx-ingress":
│    1: resource "helm_release" "nginx-ingress" {

Please share your thoughts. Thanks.

Edit2:

The DEBUG logs show some more clues:

2021-12-09T04:30:14.118Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceDiff: nginx-ingress] Release validated: timestamp=2021-12-09T04:30:14.118Z
2021-12-09T04:30:14.118Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceDiff: nginx-ingress] Done: timestamp=2021-12-09T04:30:14.118Z
2021-12-09T04:30:14.119Z [WARN]  Provider "registry.terraform.io/hashicorp/helm" produced an invalid plan for helm_release.nginx-ingress, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .cleanup_on_fail: planned value cty.False for a non-computed attribute
      - .create_namespace: planned value cty.False for a non-computed attribute
      - .verify: planned value cty.False for a non-computed attribute
      - .recreate_pods: planned value cty.False for a non-computed attribute
      - .render_subchart_notes: planned value cty.True for a non-computed attribute
      - .replace: planned value cty.False for a non-computed attribute
      - .reset_values: planned value cty.False for a non-computed attribute
      - .disable_crd_hooks: planned value cty.False for a non-computed attribute
      - .lint: planned value cty.False for a non-computed attribute
      - .namespace: planned value cty.StringVal("default") for a non-computed attribute
      - .skip_crds: planned value cty.False for a non-computed attribute
      - .disable_webhooks: planned value cty.False for a non-computed attribute
      - .force_update: planned value cty.False for a non-computed attribute
      - .timeout: planned value cty.NumberIntVal(300) for a non-computed attribute
      - .reuse_values: planned value cty.False for a non-computed attribute
      - .dependency_update: planned value cty.False for a non-computed attribute
      - .disable_openapi_validation: planned value cty.False for a non-computed attribute
      - .atomic: planned value cty.False for a non-computed attribute
      - .wait: planned value cty.True for a non-computed attribute
      - .max_history: planned value cty.NumberIntVal(0) for a non-computed attribute
      - .wait_for_jobs: planned value cty.False for a non-computed attribute
helm_release.nginx-ingress: Creating...
2021-12-09T04:30:14.119Z [INFO]  Starting apply for helm_release.nginx-ingress
2021-12-09T04:30:14.119Z [INFO]  Starting apply for helm_release.nginx-ingress
2021-12-09T04:30:14.119Z [DEBUG] helm_release.nginx-ingress: applying the planned Create change
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] setting computed for "metadata" from ComputedKeys: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Started: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Getting helm configuration: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [INFO] GetHelmConfiguration start: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] Using kubeconfig: /home/ubuntu/.kube/config: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.120Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [INFO] Successfully initialized kubernetes config: timestamp=2021-12-09T04:30:14.120Z
2021-12-09T04:30:14.121Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [INFO] GetHelmConfiguration success: timestamp=2021-12-09T04:30:14.121Z
2021-12-09T04:30:14.121Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Getting chart: timestamp=2021-12-09T04:30:14.121Z
2021-12-09T04:30:14.125Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Preparing for installation: timestamp=2021-12-09T04:30:14.125Z
2021-12-09T04:30:14.125Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 ---[ values.yaml ]-----------------------------------
{}: timestamp=2021-12-09T04:30:14.125Z
2021-12-09T04:30:14.125Z [INFO]  provider.terraform-provider-helm_v2.4.1_x5: 2021/12/09 04:30:14 [DEBUG] [resourceReleaseCreate: nginx-ingress] Installing chart: timestamp=2021-12-09T04:30:14.125Z
╷
│ Error: cannot re-use a name that is still in use
│ 
│   with helm_release.nginx-ingress,
│   on nginx-arm64.tf line 1, in resource "helm_release" "nginx-ingress":
│    1: resource "helm_release" "nginx-ingress" {
│ 
╵
2021-12-09T04:30:14.158Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2021-12-09T04:30:14.160Z [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/helm/2.4.1/linux_arm64/terraform-provider-helm_v2.4.1_x5 pid=558800
2021-12-09T04:30:14.160Z [DEBUG] provider: plugin exited

CodePudding user response：

You don't have to manually delete all the resources using kubectl. Under the hood the Terraform Helm provider still uses Helm. So if you run helm list -A you will see all the Helm releases on your cluster, including the nginx-ingress release. Deleting the release is then done via helm uninstall nginx-ingress -n REPLACE_WITH_YOUR_NAMESPACE.

Before re-running terraform apply do check if the Helm release is still in your Terraform state via terraform state list (run this from the same directory as where you run terraform apply from). If you don't see helm_release.nginx-ingress in that list then it is not in your Terraform state and you can just rerun your terraform apply. Else you have to delete it via terraform state rm helm_release.nginx-ingress and then you can run terraform apply again.