I created HPA on our k8s cluster which should auto-scale on 90% memory utilization. However, it scales UP without hitting the target percentage. I use the following config:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
namespace: {{ .Values.namespace }}
name: {{ include "helm-generic.fullname" . }}
labels:
{{- include "helm-generic.labels" . | nindent 4 }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ include "helm-generic.fullname" . }}
minReplicas: 1
maxReplicas: 2
metrics:
- type: Resource
resource:
name: memory
targetAverageUtilization: 90
So for this config it creates 2 pods which is the maxReplicas number. If I add 4 for maxReplicas it will create 3.
This is what i get from kubectl describe hpa
$ kubectl describe hpa -n trunkline
Name: test-v1
Namespace: trunkline
Labels: app.kubernetes.io/instance=test-v1
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=helm-generic
app.kubernetes.io/version=0.0.0
helm.sh/chart=helm-generic-0.1.3
Annotations: meta.helm.sh/release-name: test-v1
meta.helm.sh/release-namespace: trunkline
CreationTimestamp: Wed, 12 Oct 2022 17:36:54 0300
Reference: Deployment/test-v1
Metrics: ( current / target )
**resource memory on pods (as a percentage of request): 59% (402806784) / 90%**
resource cpu on pods (as a percentage of request): 11% (60m) / 80%
Min replicas: 1
Max replicas: 2
Deployment pods: **2 current / 2** desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
As you see the pods memory % is 59 , with target 90 which I expect to produce only 1 pod.
CodePudding user response:
This is working as intended.
targetAverageUtilization
is an average over all the matching Pods that is targeted.
The idea of HPA is:
- scale up? We have 2 Pods, average memory utilization is only 59%, this is under 90%, no need to scale up
- scale down? Since 59% is the average for 2 Pods under the current load, then if there was only one Pod taking all load it would raise to 59%*2=118% utilization, which is over 90% so we need to scale up again, so not scaling down
CodePudding user response:
The horizontal pod autoscaler has a very specific formula for calculating the target replica count:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
With the output you show, currentMetricValue
is 59% and desiredMetricValue
is 90%. Multiplying that by the currentReplicas
of 2, you get about 1.3 replicas, which gets rounded up to 2.
This formula, and especially the ceil()
round-up behavior, can make HPA very slow to scale down, especially with a small number of replicas.
More broadly, autoscaling on Kubernetes-observable memory might not work the way you expect. Most programming languages are garbage-collected (C, C , and Rust are the most notable exceptions) and garbage collectors as a rule tend to allocate a large block of operating-system memory and reuse it, rather than return it to the operating system if load decreases. If you have a pod that reaches 90% memory from the Kubernetes point of view, its possible that memory usage will never decrease. You might need to autoscale on a different metric, or attach an external metrics system like Prometheus to get more detailed memory-manager statistics you can act on.