Home > Software design >  AKS cluster autoscaller profiles modification
AKS cluster autoscaller profiles modification

Time:11-16

We are Using AKS cluster 1-19.11 and our userpools where application podsrunning are under utlization (only 30% of consumption). So we were thinking of cost optimization by reducing the node counts of nodepools.

So would like to get more details considered while planning for node count decrease.

  • Assume that the node utlisation can be estimated and calculated using the pods requests value and no need to consider the limit range as auto scaller is enabled

  • Also is it possible to modify the autoscaler profile of cluster property "scaleDownUtilizationThreshold": "0.5", to more %.. and whether its recommeneded to increase to 70%. ?

CodePudding user response:

The assumption,

node utlisation can be estimated and calculated using the pods requests value and no need to consider the limit range as auto scaler is enabled

will hold good as long as you don't care about what process/container gets evicted in case of node resource starvation (if controlled by a deployment or a replica set or stateful set the workloads will be resurrected in a new node that is scaled out by the auto scaler).

However, in most cases you would have some kind of priority for your workloads and you would want to set thresholds (limits) accordingly so that you don't have to deal with the kernel evicting important processes (maybe not the one that caused starvation but was using highest resources right at the time when the evaluation happened).

Also is it possible to modify the autoscaler profile of cluster property "scaleDownUtilizationThreshold": "0.5", to more %.. and whether its recommeneded to increase to 70%. ?

Yes, the value of Cluster Autoscaler Profile scale-down-utilization-threshold can be updated using the command:

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --cluster-autoscaler-profile scale-down-utilization-threshold=<percentage value>

[Reference]

AKS uses node resources to help the node function as part of your cluster. This usage can create a discrepancy between your node's total resources and the allocatable resources in AKS. [Reference]

Now scale-down-utilization-threshold is the node utilization level, defined as sum of requested resources divided by allocatable capacity, below which a node can be considered for scale down.

So, ultimately there can be no best practice shared on this as it is the user's use case, architectural design and requirements that dictate what should be the scale-down-utilization-threshold for the cluster auto scaler.

  • Related