I've a Strimzi Kafka cluster on GKE, and I've kafkaExporter deployed as well. Kafka Topic has acls, and a Consumer Group (spark-kafka-source-*) defined which can read data from the topic.
I'm running a Spark StructuredStreaming programs which reads data from Kafka topic. The issue is - KafkaExporter does not seem to be showing the Consumer group, when i check the metric -> kafka_consumergroup_lag
The consumer group is showing up in metric : kafka_consumergroup_members ->
kafka_consumergroup_members{consumergroup="spark-kafka-source-657d6441-5716-43d9-b456-73657a5534a3-594190416-driver-0", container="versa-kafka-gke-kafka-exporter", endpoint="tcp-prometheus", instance="10.40.0.65:9404", job="monitoring/kafka-resources-metrics", kubernetes_pod_name="versa-kafka-gke-kafka-exporter-84c7ffbb79-jzqjn", namespace="kafka", node_ip="10.142.0.24", node_name="gke-versa-kafka-gke-default-pool-a92b23b7-n0x2", pod="versa-kafka-gke-kafka-exporter-84c7ffbb79-jzqjn", strimzi_io_cluster="versa-kafka-gke", strimzi_io_kind="Kafka", strimzi_io_name="versa-kafka-gke-kafka-exporter"}
here are the yamls:
kafka-deployment.yaml (contains the kafkExporter tag)
-----------------------------------------------------
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: versa-kafka-gke #1
spec:
kafka:
version: 3.0.0
replicas: 3
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
authentication:
type: tls
- name: external
port: 9094
type: loadbalancer
tls: true
authentication:
type: tls
authorization:
type: simple
readinessProbe:
initialDelaySeconds: 15
timeoutSeconds: 5
livenessProbe:
initialDelaySeconds: 15
timeoutSeconds: 5
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
log.message.format.version: "3.0"
inter.broker.protocol.version: "3.0"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 500Gi
deleteClaim: false
logging: #9
type: inline
loggers:
kafka.root.logger.level: "INFO"
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
name: kafka-metrics
key: kafka-metrics-config.yml
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 2Gi
deleteClaim: false
resources:
requests:
memory: 1Gi
cpu: "1"
limits:
memory: 2Gi
cpu: "1.5"
logging:
type: inline
loggers:
zookeeper.root.logger: "INFO"
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
name: kafka-metrics
key: zookeeper-metrics-config.yml
entityOperator: #11
topicOperator: {}
userOperator: {}
kafkaExporter:
topicRegex: ".*"
groupRegex: ".*"
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kafka-metrics
labels:
app: strimzi
data:
kafka-metrics-config.yml: |
# See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
lowercaseOutputName: true
rules:
# Special cases and very specific rules
- pattern: kafka.server<type=(. ), name=(. ), clientId=(. ), topic=(. ), partition=(.*)><>Value
name: kafka_server_$1_$2
type: GAUGE
labels:
clientId: "$3"
topic: "$4"
partition: "$5"
- pattern: kafka.server<type=(. ), name=(. ), clientId=(. ), brokerHost=(. ), brokerPort=(. )><>Value
name: kafka_server_$1_$2
type: GAUGE
labels:
clientId: "$3"
broker: "$4:$5"
- pattern: kafka.server<type=(. ), cipher=(. ), protocol=(. ), listener=(. ), networkProcessor=(. )><>connections
name: kafka_server_$1_connections_tls_info
type: GAUGE
labels:
cipher: "$2"
protocol: "$3"
listener: "$4"
networkProcessor: "$5"
- pattern: kafka.server<type=(. ), clientSoftwareName=(. ), clientSoftwareVersion=(. ), listener=(. ), networkProcessor=(. )><>connections
name: kafka_server_$1_connections_software
type: GAUGE
labels:
clientSoftwareName: "$2"
clientSoftwareVersion: "$3"
listener: "$4"
networkProcessor: "$5"
- pattern: "kafka.server<type=(. ), listener=(. ), networkProcessor=(. )><>(. ):"
name: kafka_server_$1_$4
type: GAUGE
labels:
listener: "$2"
networkProcessor: "$3"
- pattern: kafka.server<type=(. ), listener=(. ), networkProcessor=(. )><>(. )
name: kafka_server_$1_$4
type: GAUGE
labels:
listener: "$2"
networkProcessor: "$3"
# Some percent metrics use MeanRate attribute
# Ex) kafka.server<type=(KafkaRequestHandlerPool), name=(RequestHandlerAvgIdlePercent)><>MeanRate
- pattern: kafka.(\w )<type=(. ), name=(. )Percent\w*><>MeanRate
name: kafka_$1_$2_$3_percent
type: GAUGE
# Generic gauges for percents
- pattern: kafka.(\w )<type=(. ), name=(. )Percent\w*><>Value
name: kafka_$1_$2_$3_percent
type: GAUGE
- pattern: kafka.(\w )<type=(. ), name=(. )Percent\w*, (. )=(. )><>Value
name: kafka_$1_$2_$3_percent
type: GAUGE
labels:
"$4": "$5"
# Generic per-second counters with 0-2 key/value pairs
- pattern: kafka.(\w )<type=(. ), name=(. )PerSec\w*, (. )=(. ), (. )=(. )><>Count
name: kafka_$1_$2_$3_total
type: COUNTER
labels:
"$4": "$5"
"$6": "$7"
- pattern: kafka.(\w )<type=(. ), name=(. )PerSec\w*, (. )=(. )><>Count
name: kafka_$1_$2_$3_total
type: COUNTER
labels:
"$4": "$5"
- pattern: kafka.(\w )<type=(. ), name=(. )PerSec\w*><>Count
name: kafka_$1_$2_$3_total
type: COUNTER
# Generic gauges with 0-2 key/value pairs
- pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(. ), (. )=(. )><>Value
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
"$6": "$7"
- pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(. )><>Value
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
- pattern: kafka.(\w )<type=(. ), name=(. )><>Value
name: kafka_$1_$2_$3
type: GAUGE
# Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
# Note that these are missing the '_sum' metric!
- pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(. ), (. )=(. )><>Count
name: kafka_$1_$2_$3_count
type: COUNTER
labels:
"$4": "$5"
"$6": "$7"
- pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(.*), (. )=(. )><>(\d )thPercentile
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
"$6": "$7"
quantile: "0.$8"
- pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(. )><>Count
name: kafka_$1_$2_$3_count
type: COUNTER
labels:
"$4": "$5"
- pattern: kafka.(\w )<type=(. ), name=(. ), (. )=(.*)><>(\d )thPercentile
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
quantile: "0.$6"
- pattern: kafka.(\w )<type=(. ), name=(. )><>Count
name: kafka_$1_$2_$3_count
type: COUNTER
- pattern: kafka.(\w )<type=(. ), name=(. )><>(\d )thPercentile
name: kafka_$1_$2_$3
type: GAUGE
labels:
quantile: "0.$4"
zookeeper-metrics-config.yml: |
# See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
lowercaseOutputName: true
rules:
# replicated Zookeeper
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d )><>(\\w )"
name: "zookeeper_$2"
type: GAUGE
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d ), name1=replica.(\\d )><>(\\w )"
name: "zookeeper_$3"
type: GAUGE
labels:
replicaId: "$2"
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d ), name1=replica.(\\d ), name2=(\\w )><>(Packets\\w )"
name: "zookeeper_$4"
type: COUNTER
labels:
replicaId: "$2"
memberType: "$3"
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d ), name1=replica.(\\d ), name2=(\\w )><>(\\w )"
name: "zookeeper_$4"
type: GAUGE
labels:
replicaId: "$2"
memberType: "$3"
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d ), name1=replica.(\\d ), name2=(\\w ), name3=(\\w )><>(\\w )"
name: "zookeeper_$4_$5"
type: GAUGE
labels:
replicaId: "$2"
memberType: "$3"
kafkaUser yaml:
---------
kind: KafkaUser
metadata:
name: syslog-vani-noacl
labels:
strimzi.io/cluster: versa-kafka-gke
spec:
authentication:
type: tls
authorization:
type: simple
acls:
# Topics and groups used by the HTTP clients through the HTTP Bridge
# Change to match the topics used by your HTTP clients
- resource:
type: topic
name: syslog.ueba-us4.v1.versa.demo3
patternType: literal
operation: Read
host: "*"
- resource:
type: topic
name: syslog.ueba-us4.v1.versa.demo3
patternType: literal
operation: Describe
host: "*"
- resource:
type: topic
name: syslog.ueba-us4.v1.versa.demo3
patternType: literal
operation: Write
host: "*"
- resource:
type: group
name: 'spark-kafka-source-'
patternType: prefix
operation: Read
host: "*"
- resource:
type: group
name: 'ss.consumer'
patternType: literal
operation: Read
host: "*"
- resource:
type: group
name: 'versa-console-consumer'
patternType: literal
operation: Read
host: "*"
None of the consumer groups mentioned in the kafkaUser yaml are coming up in the metric -> kafka_consumergroup_lag
Any ideas how to debug/fix this ?
tia!
Pls note : My Spark programs is running on dataproc (i.e. not the kubernetes cluster where Kafka is deployed), does that affect how kafkaExporter shows the consumer group lag ?
CodePudding user response:
The Kafka Exporter is exporting the Prometheus metrics based on the committed consumer offsets from the __consumer_offsets
topic. So when some consumer connects to your Kafka cluster, consumes some messages and commits them, it will see them and show them in the metrics.
The KafkaUser
CR on the other hand lists only the ACLs. So you give the user the right to use such a consumer group. But that does not mean the consumer group exists. Only once the user uses it and commits something it will show up.
So what you are seeing could be completely fine and expected.