Home > OS >  is there a way to use specific threshold in autoscaler group?
is there a way to use specific threshold in autoscaler group?

Time:07-23

I have installed autoscaler group on my cluster (running on aws). It works fine scaling up and down. However i configured the threshold (to scale down) be 0.4 (that means that each of the worker nodes cpu requirements less then this should be down). however what happens that the autoscaler just take the bigger value from cpu or memory.

this is my configuration

      - command:
    - ./cluster-autoscaler
    - --cloud-provider=aws
    - --namespace=kube-system
    - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/{my-cluster-name}
    - --logtostderr=true
    - --scale-down-utilization-threshold=0.4
    - --skip-nodes-with-local-storage=false
    - --skip-nodes-with-system-pods=false
    - --stderrthreshold=info
    - --v=4

this is the log of autoscaler

I0720 07:48:48.243694       1 scale_down.go:421] Node ip-10-0-146-54.ec2.internal - memory utilization 0.674829
I0720 07:48:48.243705       1 scale_down.go:424] Node ip-10-0-146-54.ec2.internal is not suitable for removal - memory utilization too big (0.674829)
I0720 07:48:48.243719       1 scale_down.go:421] Node ip-10-0-144-198.ec2.internal - cpu utilization 0.873750
I0720 07:48:48.243741       1 scale_down.go:424] Node ip-10-0-144-198.ec2.internal is not suitable for removal - cpu utilization too big (0.873750)
I0720 07:48:48.243753       1 scale_down.go:421] Node ip-10-0-132-191.ec2.internal - cpu utilization 0.836250
I0720 07:48:48.243776       1 scale_down.go:424] Node ip-10-0-132-191.ec2.internal is not suitable for removal - cpu utilization too big (0.836250)
I0720 07:48:48.243796       1 scale_down.go:421] Node ip-10-0-158-56.ec2.internal - memory utilization 0.756398
I0720 07:48:48.243803       1 scale_down.go:424] Node ip-10-0-158-56.ec2.internal is not suitable for removal - memory utilization too big (0.756398)
I0720 07:48:48.243814       1 scale_down.go:421] Node ip-10-0-146-236.ec2.internal - memory utilization 0.471180
I0720 07:48:48.243821       1 scale_down.go:424] Node ip-10-0-146-236.ec2.internal is not suitable for removal - memory utilization too big (0.471180)
I0720 07:48:48.243831       1 scale_down.go:421] Node ip-10-0-141-80.ec2.internal - cpu utilization 0.911250
I0720 07:48:48.243837       1 scale_down.go:424] Node ip-10-0-141-80.ec2.internal is not suitable for removal - cpu utilization too big (0.911250)
I0720 07:48:48.243846       1 scale_down.go:421] Node ip-10-0-131-74.ec2.internal - cpu utilization 0.836250
I0720 07:48:48.243851       1 scale_down.go:424] Node ip-10-0-131-74.ec2.internal is not suitable for removal - cpu utilization too big (0.836250)
I0720 07:48:48.243860       1 scale_down.go:421] Node ip-10-0-135-213.ec2.internal - cpu utilization 0.836250
I0720 07:48:48.243865       1 scale_down.go:424] Node ip-10-0-135-213.ec2.internal is not suitable for removal - cpu utilization too big (0.836250)
I0720 07:48:48.243874       1 scale_down.go:421] Node ip-10-0-145-101.ec2.internal - cpu utilization 0.836250
I0720 07:48:48.243879       1 scale_down.go:424] Node ip-10-0-145-101.ec2.internal is not suitable for removal - cpu utilization too big (0.836250)
I0720 07:48:48.243891       1 scale_down.go:421] Node ip-10-0-149-91.ec2.internal - cpu utilization 0.886250
I0720 07:48:48.243897       1 scale_down.go:424] Node ip-10-0-149-91.ec2.internal is not suitable for removal - cpu utilization too big (0.886250)
I0720 07:48:48.243905       1 scale_down.go:421] Node ip-10-0-130-30.ec2.internal - memory utilization 0.559890
I0720 07:48:48.243913       1 scale_down.go:424] Node ip-10-0-130-30.ec2.internal is not suitable for removal - memory utilization too big (0.559890)
I0720 07:48:48.243924       1 scale_down.go:421] Node ip-10-0-145-37.ec2.internal - cpu utilization 0.836250
I0720 07:48:48.243933       1 scale_down.go:424] Node ip-10-0-145-37.ec2.internal is not suitable for removal - cpu utilization too big (0.836250)
I0720 07:48:48.243943       1 scale_down.go:421] Node ip-10-0-135-59.ec2.internal - cpu utilization 0.836250
I0720 07:48:48.243949       1 scale_down.go:424] Node ip-10-0-135-59.ec2.internal is not suitable for removal - cpu utilization too big (0.836250)
I0720 07:48:48.243957       1 scale_down.go:421] Node ip-10-0-145-80.ec2.internal - cpu utilization 0.898750
I0720 07:48:48.243964       1 scale_down.go:424] Node ip-10-0-145-80.ec2.internal is not suitable for removal - cpu utilization too big (0.898750)
I0720 07:48:48.243975       1 scale_down.go:421] Node ip-10-0-128-31.ec2.internal - cpu utilization 0.930000
I0720 07:48:48.243981       1 scale_down.go:424] Node ip-10-0-128-31.ec2.internal is not suitable for removal - cpu utilization too big (0.930000)
I0720 07:48:48.243988       1 scale_down.go:421] Node ip-10-0-150-103.ec2.internal - memory utilization 0.559890
I0720 07:48:48.244009       1 scale_down.go:424] Node ip-10-0-150-103.ec2.internal is not suitable for removal - memory utilization too big (0.559890)
I0720 07:48:48.244025       1 scale_down.go:421] Node ip-10-0-138-235.ec2.internal - cpu utilization 0.855000
I0720 07:48:48.244033       1 scale_down.go:424] Node ip-10-0-138-235.ec2.internal is not suitable for removal - cpu utilization too big (0.855000)
I0720 07:48:48.244044       1 scale_down.go:421] Node ip-10-0-139-155.ec2.internal - memory utilization 0.675887
I0720 07:48:48.244049       1 scale_down.go:424] Node ip-10-0-139-155.ec2.internal is not suitable for removal - memory utilization too big (0.675887)
I0720 07:48:48.244059       1 scale_down.go:421] Node ip-10-0-149-95.ec2.internal - memory utilization 0.512408
I0720 07:48:48.244065       1 scale_down.go:424] Node ip-10-0-149-95.ec2.internal is not suitable for removal - memory utilization too big (0.512408)
I0720 07:48:48.244073       1 scale_down.go:421] Node ip-10-0-146-35.ec2.internal - cpu utilization 0.836250
I0720 07:48:48.244079       1 scale_down.go:424] Node ip-10-0-146-35.ec2.internal is not suitable for removal - cpu utilization too big (0.836250)
I0720 07:48:48.244088       1 scale_down.go:421] Node ip-10-0-148-252.ec2.internal - cpu utilization 0.836250
I0720 07:48:48.244124       1 scale_down.go:424] Node ip-10-0-148-252.ec2.internal is not suitable for removal - cpu utilization too big (0.836250)
I0720 07:48:48.244141       1 scale_down.go:421] Node ip-10-0-154-233.ec2.internal - cpu utilization 0.996250
I0720 07:48:48.244149       1 scale_down.go:424] Node ip-10-0-154-233.ec2.internal is not suitable for removal - cpu utilization too big (0.996250)
I0720 07:48:48.244158       1 scale_down.go:421] Node ip-10-0-157-83.ec2.internal - cpu utilization 0.961250
I0720 07:48:48.244163       1 scale_down.go:424] Node ip-10-0-157-83.ec2.internal is not suitable for removal - cpu utilization too big (0.961250)
I0720 07:48:48.244175       1 scale_down.go:421] Node ip-10-0-159-144.ec2.internal - memory utilization 0.583811
I0720 07:48:48.244191       1 scale_down.go:424] Node ip-10-0-159-144.ec2.internal is not suitable for removal - memory utilization too big (0.583811)
I0720 07:48:48.244200       1 scale_down.go:421] Node ip-10-0-144-12.ec2.internal - cpu utilization 0.886250
I0720 07:48:48.244205       1 scale_down.go:424] Node ip-10-0-144-12.ec2.internal is not suitable for removal - cpu utilization too big (0.886250)
I0720 07:48:48.244215       1 scale_down.go:421] Node ip-10-0-156-220.ec2.internal - cpu utilization 0.836250
I0720 07:48:48.244222       1 scale_down.go:424] Node ip-10-0-156-220.ec2.internal is not suitable for removal - cpu utilization too big (0.836250)
I0720 07:48:48.244233       1 scale_down.go:421] Node ip-10-0-131-80.ec2.internal - cpu utilization 0.836250
I0720 07:48:48.244242       1 scale_down.go:424] Node ip-10-0-131-80.ec2.internal is not suitable for removal - cpu utilization too big (0.836250)
I0720 07:48:48.244257       1 scale_down.go:421] Node ip-10-0-140-90.ec2.internal - cpu utilization 0.955000
I0720 07:48:48.244274       1 scale_down.go:424] Node ip-10-0-140-90.ec2.internal is not suitable for removal - cpu utilization too big (0.955000)

sometimes autoscaler deciding to choose cpu utilization and sometimes it's deciding to use memory utilization. i would like to use only cpu utilization for decreasing nodes

CodePudding user response:

This may not completely answer your question but after investigation into cluster-autoscaler for our use the following was discovered:

Scale-up:

  • The cluster-autoscaler obtains cluster metrics every 10secodns to determine if an Up-Scale or Down-Scale action is a potential option.
  • When 1 of the scaling criteria is met it is detected and the relevant infrastructure is marked for a scaling event.

Scale-down:

  • When a node becomes under-utilised it becomes a potential candidate for removal.
  • The evaluation time period is 10minutes, after this point if the highlighted nodes are still under-utilised they are removed. Any pods on the surplus node are moved onto a remaining node(s).

Perhaps the decision to use either CPU or memory to scale-down is based on which metric meets the threshold for the 10minute time period first.

This page may be useful but I couldnt see anything which answers your direct query: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-scale-down-work

Sorry, hope this is somewhat helpful :)

  • Related