unable to fetch metrics from custom metrics API: the server is currently unable to handle the reques-CodePudding

I am using a HPA based on a custom metric on GKE.

The HPA is not working and it's showing me this error log :

unable to fetch metrics from custom metrics API: the server is currently unable to handle the request

when i run kubectl get apiservices | grep custom i get

v1beta1.custom.metrics.k8s.io services/prometheus-adapter False (FailedDiscoveryCheck) 135d

this is the HPA spec config :

spec:
  scaleTargetRef:
    kind: Deployment
    name: api-name
    apiVersion: apps/v1
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Object
      object:
        target:
          kind: Service
          name: api-name
          apiVersion: v1
        metricName: messages_ready_per_consumer
        targetValue: '1'

and this is the service's spec config :

spec:
  ports:
    - name: worker-metrics
      protocol: TCP
      port: 8080
      targetPort: worker-metrics
  selector:
    app.kubernetes.io/instance: api
    app.kubernetes.io/name: api-name
  clusterIP: 10.8.7.9
  clusterIPs:
    - 10.8.7.9
  type: ClusterIP
  sessionAffinity: None
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack

what should i do to make it work ?

CodePudding user response：

What do you get for kubectl get pod -l "app.kubernetes.io/instance=api,app.kubernetes.io/name=api-name"? There should be a pod, to which the service reffers. If there is a pod, check its logs with kubectl logs <pod-name>. you can add -f to kubectl logs command, to follow the logs.

CodePudding user response：

First of all, confirm that the metrics Server POD is running in your kube-system namespace. Also, you can use the following manifest:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.1
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        imagePullPolicy: Always
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

If so, take a look into the logs and look for any stackdriver adapter’s line. This issue is commonly caused due to a problem with the custom-metrics-stackdriver-adapter. It usually crashes in the metrics-server namespace. To solve that, use the resource from this URL, and for the deployment, use this image:

gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.1

Another common root cause of this is an OOM issue. In this case, adding more memory solves the problem.

You can use the following threads for more reference GKE - HPA using custom metrics - unable to fetch metrics, Stackdriver-metadata-agent-cluster-level gets OOMKilled, and Custom-metrics-stackdriver-adapter pod keeps crashing.