I'm currently trying to monitor the EKS Node group status, sometimes my node groups show degraded and I want a CloudWatch alert whenever the status is in a Degraded state, I checked CloudWatch Metrics there are no standard metrics, and even I'm unable to find the event in Cloud trail,
Is there any possibility's to creating the alarm using AWS Cloud trail events, Event bridge, or CloudWatch Kindly help to find the solution for this
CodePudding user response:
For CloudWatch, please take a looks at this:
CodePudding user response:
I think you can combine Lambda & CloudWatch & EventBridge service here to implement your simple health-check status for a single or multiple node groups.
For your health check Lambda function:
- We create a Lambda with Python3 (3.9 for example)
- We describe the node group using Boto3
- We put a custom metric to CloudWatch metrics so if the status is
Active
, we put1
else0
.
When we have the function ready, we prepare the every 1 minutes (up to you) setup.
- We create an EventBridge (EB) rule with every 1 min triggers
- The EB rule destination is the Lambda function
Once we have enough data points from CloudWatch metrics, we can create a CloudWatch alarm to help us notifying to E-mail or others.
References:
- https://stackify.com/custom-metrics-aws-lambda/
- https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-run-lambda-schedule.html
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/eks.html#EKS.Client.describe_nodegroup
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudwatch.html