Unschedulable Kubernetes pods on GCP using Autoscaler


I have a Kubernetes Cluster with pods autoscalables using Autopilot. Suddenly they stop to autoscale, I'm new at Kubernetes and I don't know exactly what to do or what is supposed to put in the console to show for help.

The pods automatically are Unschedulable and inside the cluster put his state at Pending instead of running and doesn't allow me to enter or interact.

Also I can't delete or stop them at GCP Console. There's no issue regarding memory or insufficient CPU because there's not much server running on it.

The cluster was working as expected before this issue I have.

Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=odoo-service
Annotations:    seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:         Pending
IPs:            <none>
Controlled By:  ReplicaSet/odoo-cluster-dev-5bd88899d7
    Image:      us-central1-docker.pkg.dev/adams-dev/adams-odoo/odoo-service:v58
    Port:       <none>
    Host Port:  <none>
      cpu:                2
      ephemeral-storage:  1Gi
      memory:             8Gi
      cpu:                2
      ephemeral-storage:  1Gi
      memory:             8Gi
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zqh5r (ro)
    Image:      gcr.io/cloudsql-docker/gce-proxy:1.17
    Port:       <none>
    Host Port:  <none>
      cpu:                1
      ephemeral-storage:  1Gi
      memory:             2Gi
      cpu:                1
      ephemeral-storage:  1Gi
      memory:             2Gi
    Environment:          <none>
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zqh5r (ro)
  Type           Status
  PodScheduled   False 
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
  Type     Reason             Age                     From                                   Message
  ----     ------             ----                    ----                                   -------
  Normal   NotTriggerScaleUp  28m (x248 over 3h53m)   cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 in backoff after failed scale-up, 2 Insufficient cpu, 2 Insufficient memory
  Normal   NotTriggerScaleUp  8m1s (x261 over 3h55m)  cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient memory, 4 in backoff after failed scale-up, 2 Insufficient cpu
  Normal   NotTriggerScaleUp  3m (x1646 over 3h56m)   cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 2 Insufficient memory, 4 in backoff after failed scale-up
  Warning  FailedScheduling   20s (x168 over 3h56m)   gke.io/optimize-utilization-scheduler  0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.

  Type     Reason             Age                      From                                   Message
  ----     ------             ----                     ----                                   -------
  Normal   NotTriggerScaleUp  28m (x250 over 3h56m)    cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient memory, 4 in backoff after failed scale-up, 2 Insufficient cpu
  Normal   NotTriggerScaleUp  8m2s (x300 over 3h55m)   cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 in backoff after failed scale-up, 2 Insufficient cpu, 2 Insufficient memory
  Warning  FailedScheduling   5m21s (x164 over 3h56m)  gke.io/optimize-utilization-scheduler  0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.
  Normal   NotTriggerScaleUp  3m1s (x1616 over 3h55m)  cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 2 Insufficient memory, 4 in backoff after failed scale-up

I don't know how much I can debug or fix it.

CodePudding user response:

Pods failed to schedule on any node because none of the nodes have cpu available.

Cluster autoscaler tried to scale up but it backoff after failed scale-up attempt which indicates possible issues with scaling up managed instance groups which are part of the node pool.

Cluster autoscaler tried to scale up but as the quota limit is reached no new nodes can be added.

You can't see the Autopilot GKE VMs that are being counted against your quota.

Try by creating the autopilot cluster in another region. If your needs are not no longer fulfilled by an autopilot cluster then go for a standard cluster.

