Home > Net >  KDA metrics in cloudwatch different from Flink metrics
KDA metrics in cloudwatch different from Flink metrics

Time:03-29

I have a Flink application deployed on AWS Kinesis Data Analytics. My current setting is:

Parallelism=128
Parallelism per KPU=4

The issue I have is, there is a big difference between the counts shown on the Flink web UI vs the Count shown in Cloudwatch even for Metrics that come out of the box.

Example:

Counts from Flink UI:

Records Sent: Close to 1 Billion enter image description here

Count from Cloudwatch:

NumRecordsIn:

enter image description here

NumRecordsOut: enter image description here

Both of these are close to 10.8Million.

The metrics for KDA are configured at Task level.

I am wondering why I am seeing this huge discrepancy. Is the parallelism having some effect on the counts?

FWIW, I added a custom metric which tracks the numRecordsIn. That also seems to be similar to the out of the box NumRecordsIn metric.

CodePudding user response:

It appears that you are comparing the total numRecordsIn/Out across the lifetime of the job (roughly 1 billion) to the maximum ever seen in one minute (around 10 million).

CodePudding user response:

What David has mentioned above is correct. I was finally able to find out how to get the values from Flink UI on to CloudWatch. The problem with CloudWatch is that, it does not include the parallelism. So in order to get the count of records emitted by a specific task, take the Average(numRecordsOut) of that task and multiply it by parallelism.

  • Related