I have created a chart with Helm that contains one deployment that launches a Docker image containing multiple discord bots in Python. Right now, the image only contains one bot, but could contain more in the future. When the chart is deployed onto my GCP cluster, the pod attempts to scale up:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 25s gke.io/optimize-utilization-scheduler 0/4 nodes are available: 4 Insufficient cpu, 4 Insufficient memory.
Normal TriggeredScaleUp 18s cluster-autoscaler pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/ocorg-334906/zones/northamerica-northeast1-b/instanceGroups/gk3-federado-default-pool-a7b1a13b-grp 2->3 (max: 1000)}]
I don't see any reason it should do so, other than there might be a slight cpu/mem spike to get the docker container running? Either way, 4 nodes seems excessive. Plus, wouldn't it attempt to spin up one pod once the usage dies down again?
Also, this is complicated by the fact that this is a somewhat recent problem. I have another deployment that works correctly but was recently having an issue scaling up as well (which I solved by restarting the cluster in a different google region). Could it be that the region simply doesn't have enough resources to spin up even one pod?
I have autoscaling in my helm chart disabled, but I know that GCP has the cluster-autoscaler automatically enabled. Is there any way I could disable this to avoid it trying and failing to allocate more resources?
I tried to deploy the chart and get a pod running, but instead the pod attempts to autoscale and fails to run.
CodePudding user response:
These messages are coming from the cluster autoscaler, a component which creates and destroys nodes. It's not anything in your Helm chart or Kubernetes YAML, and it's separate from the horizontal pod autoscaler if you have that configured.
Those messages are telling you:
- The cluster currently (before you've installed anything) is running 4 nodes
- The pod's configured
resources: { requests: }
don't fit on any of the existing nodes; no node has either enough CPU or enough memory to fit it - The cluster autoscaler believes the pod will fit on a new node
- A new node is being provisioned, with the goal that this pod will run there
This is only controlled by the resource requests in the existing pods. If a pod is using more memory than it requests, it won't block scheduling, but it could get evicted if the node runs out of memory.
So, the only thing this one pod is doing that affects the cluster is having large resource requests. If it only needs the additional memory during startup and you can tolerate it getting evicted during the startup phase, you could reduce the memory and CPU requests to only what it needs in steady-state runtime. (But note that some language runtimes are bad about returning unused memory to the OS; I don't know about Python specifically here but it's possible that the Kubernetes-observed memory usage will never decrease.)