Home > Software engineering >  Strimzi Kafka Kafka Exporter : consumer group not showing up in Prometheus metric kafka_consumergr
Strimzi Kafka Kafka Exporter : consumer group not showing up in Prometheus metric kafka_consumergr

Time:09-02

I've a Strimzi Kafka cluster on GKE, and I've kafkaExporter deployed as well. Kafka Topic has acls, and a Consumer Group (spark-kafka-source-*) defined which can read data from the topic.

I'm running a Spark StructuredStreaming programs which reads data from Kafka topic. The issue is - KafkaExporter does not seem to be showing the Consumer group, when i check the metric -> kafka_consumergroup_lag

The consumer group is showing up in metric : kafka_consumergroup_members ->

kafka_consumergroup_members{consumergroup="spark-kafka-source-657d6441-5716-43d9-b456-73657a5534a3-594190416-driver-0", container="versa-kafka-gke-kafka-exporter", endpoint="tcp-prometheus", instance="10.40.0.65:9404", job="monitoring/kafka-resources-metrics", kubernetes_pod_name="versa-kafka-gke-kafka-exporter-84c7ffbb79-jzqjn", namespace="kafka", node_ip="10.142.0.24", node_name="gke-versa-kafka-gke-default-pool-a92b23b7-n0x2", pod="versa-kafka-gke-kafka-exporter-84c7ffbb79-jzqjn", strimzi_io_cluster="versa-kafka-gke", strimzi_io_kind="Kafka", strimzi_io_name="versa-kafka-gke-kafka-exporter"}

here are the yamls:

kafka-deployment.yaml (contains the kafkExporter tag)
-----------------------------------------------------

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: versa-kafka-gke #1
spec:
  kafka:
    version: 3.0.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
        authentication:
          type: tls  
      - name: external
        port: 9094
        type: loadbalancer
        tls: true 
        authentication:
          type: tls
    authorization:
      type: simple    
    readinessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    livenessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5     
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      log.message.format.version: "3.0"
      inter.broker.protocol.version: "3.0"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 500Gi
        deleteClaim: false
    logging: #9
      type: inline
      loggers:
        kafka.root.logger.level: "INFO"
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: kafka-metrics-config.yml    
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 2Gi
      deleteClaim: false
    resources:
      requests:
        memory: 1Gi
        cpu: "1"
      limits:
        memory: 2Gi
        cpu: "1.5"
    logging:
      type: inline
      loggers:
        zookeeper.root.logger: "INFO"
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics
          key: zookeeper-metrics-config.yml    
  entityOperator: #11
    topicOperator: {}
    userOperator: {}
  kafkaExporter:
    topicRegex: ".*"
    groupRegex: ".*"

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kafka-metrics
  labels:
    app: strimzi
data:
  kafka-metrics-config.yml: |
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    rules:
    # Special cases and very specific rules
    - pattern: kafka.server<type=(. ), name=(. ), clientId=(. ), topic=(. ), partition=(.*)><>Value
      name: kafka_server_$1_$2
      type: GAUGE
      labels:
       clientId: "$3"
       topic: "$4"
       partition: "$5"
    - pattern: kafka.server<type=(. ), name=(. ), clientId=(. ), brokerHost=(. ), brokerPort=(. )><>Value
      name: kafka_server_$1_$2
      type: GAUGE
      labels:
       clientId: "$3"
       broker: "$4:$5"
    - pattern: kafka.server<type=(. ), cipher=(. ), protocol=(. ), listener=(. ), networkProcessor=(. )><>connections
      name: kafka_server_$1_connections_tls_info
      type: GAUGE
      labels:
        cipher: "$2"
        protocol: "$3"
        listener: "$4"
        networkProcessor: "$5"
    - pattern: kafka.server<type=(. ), clientSoftwareName=(. ), clientSoftwareVersion=(. ), listener=(. ), networkProcessor=(. )><>connections
      name: kafka_server_$1_connections_software
      type: GAUGE
      labels:
        clientSoftwareName: "$2"
        clientSoftwareVersion: "$3"
        listener: "$4"
        networkProcessor: "$5"
    - pattern: "kafka.server<type=(. ), listener=(. ), networkProcessor=(. )><>(. ):"
      name: kafka_server_$1_$4
      type: GAUGE
      labels:
       listener: "$2"
       networkProcessor: "$3"
    - pattern: kafka.server<type=(. ), listener=(. ), networkProcessor=(. )><>(. )
      name: kafka_server_$1_$4
      type: GAUGE
      labels:
       listener: "$2"
       networkProcessor: "$3"
    # Some percent metrics use MeanRate attribute
    # Ex) kafka.server<type=(KafkaRequestHandlerPool), name=(RequestHandlerAvgIdlePercent)><>MeanRate
    - pattern: kafka.(\w )<type=(. ), name=(. )Percent\w*><>MeanRate
      name: kafka_$1_$2_$3_percent
      type: GAUGE
    # Generic gauges for percents
    - pattern: kafka.(\w )<type=(. ), name=(. )Percent\w*><>Value
      name: kafka_$1_$2_$3_percent
      type: GAUGE
    - pattern: kafka.(\w )<type=(. ), name=(. )Percent\w*, (. )=(. )><>Value
      name: kafka_$1_$2_$3_percent
      type: GAUGE
      labels:
        "$4": "$5"
    # Generic per-second counters with 0-2 key/value pairs
    - pattern: kafka.(\w )<type=(. ), name=(. )PerSec\w*, (. )=(. ), (. )=(. )><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w )<type=(. ), name=(. )PerSec\w*, (. )=(. )><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
      labels:
        "$4": "$5"
    - pattern: kafka.(\w )<type=(. ), name=(. )PerSec\w*><>Count
      name: kafka_$1_$2_$3_total
      type: COUNTER
    # Generic gauges with 0-2 key/value pairs
    - pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(. ), (. )=(. )><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(. )><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
    - pattern: kafka.(\w )<type=(. ), name=(. )><>Value
      name: kafka_$1_$2_$3
      type: GAUGE
    # Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
    # Note that these are missing the '_sum' metric!
    - pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(. ), (. )=(. )><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
      labels:
        "$4": "$5"
        "$6": "$7"
    - pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(.*), (. )=(. )><>(\d )thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        "$6": "$7"
        quantile: "0.$8"
    - pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(. )><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
      labels:
        "$4": "$5"
    - pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(.*)><>(\d )thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        "$4": "$5"
        quantile: "0.$6"
    - pattern: kafka.(\w )<type=(. ), name=(. )><>Count
      name: kafka_$1_$2_$3_count
      type: COUNTER
    - pattern: kafka.(\w )<type=(. ), name=(. )><>(\d )thPercentile
      name: kafka_$1_$2_$3
      type: GAUGE
      labels:
        quantile: "0.$4"
  zookeeper-metrics-config.yml: |
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    rules:
    # replicated Zookeeper
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d )><>(\\w )"
      name: "zookeeper_$2"
      type: GAUGE
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d ), name1=replica.(\\d )><>(\\w )"
      name: "zookeeper_$3"
      type: GAUGE
      labels:
        replicaId: "$2"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d ), name1=replica.(\\d ), name2=(\\w )><>(Packets\\w )"
      name: "zookeeper_$4"
      type: COUNTER
      labels:
        replicaId: "$2"
        memberType: "$3"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d ), name1=replica.(\\d ), name2=(\\w )><>(\\w )"
      name: "zookeeper_$4"
      type: GAUGE
      labels:
        replicaId: "$2"
        memberType: "$3"
    - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d ), name1=replica.(\\d ), name2=(\\w ), name3=(\\w )><>(\\w )"
      name: "zookeeper_$4_$5"
      type: GAUGE
      labels:
        replicaId: "$2"
        memberType: "$3"


kafkaUser yaml:
---------

kind: KafkaUser
metadata:
  name: syslog-vani-noacl
  labels:
    strimzi.io/cluster: versa-kafka-gke
spec:
  authentication:
    type: tls
  authorization:
    type: simple
    acls:
    # Topics and groups used by the HTTP clients through the HTTP Bridge
    # Change to match the topics used by your HTTP clients
    - resource:
        type: topic
        name: syslog.ueba-us4.v1.versa.demo3
        patternType: literal
      operation: Read
      host: "*"
    - resource:
        type: topic
        name: syslog.ueba-us4.v1.versa.demo3
        patternType: literal
      operation: Describe
      host: "*"
    - resource:
        type: topic
        name: syslog.ueba-us4.v1.versa.demo3
        patternType: literal
      operation: Write
      host: "*"
    - resource:
        type: group
        name: 'spark-kafka-source-'
        patternType: prefix
      operation: Read
      host: "*"
    - resource:
        type: group
        name: 'ss.consumer'
        patternType: literal
      operation: Read
      host: "*"
    - resource:
        type: group
        name: 'versa-console-consumer'
        patternType: literal
      operation: Read
      host: "*"

None of the consumer groups mentioned in the kafkaUser yaml are coming up in the metric -> kafka_consumergroup_lag

Any ideas how to debug/fix this ?

tia!


Pls note : My Spark programs is running on dataproc (i.e. not the kubernetes cluster where Kafka is deployed), does that affect how kafkaExporter shows the consumer group lag ?

CodePudding user response:

The Kafka Exporter is exporting the Prometheus metrics based on the committed consumer offsets from the __consumer_offsets topic. So when some consumer connects to your Kafka cluster, consumes some messages and commits them, it will see them and show them in the metrics.

The KafkaUser CR on the other hand lists only the ACLs. So you give the user the right to use such a consumer group. But that does not mean the consumer group exists. Only once the user uses it and commits something it will show up.

So what you are seeing could be completely fine and expected.

  • Related