Application Load Balancer Target Group Register/Deregister Infinite Loop-CodePudding

Setup

Security Groups

ALB (inbound rules)
- HTTPS:443 from 0.0.0.0/0 & ::/0
- HTTP:80 from 0.0.0.0/0 & ::/0
Cluster (inbound rules)
- All traffic from ALB security group

Cluster

instance is t2.micro (only running 1 instance in subnets us-east-1<a,b,c> under default VPC with public IP enabled)
client → 0.375 vCPU/0.25 GB, 1 task, bridge network, 0:3000 (host:container)
server → 0.25 vCPU/0.25 GB, 2 tasks, bridge network, 0:5000 (host:container)

ALB

availability zones: us-east-1<a,b,c>, same default VPC
listeners:
- HTTP:80 → redirect to HTTPS://#{host}:443/#{path}?#{query}
- HTTPS:443 (/) → forward to client target group
- HTTPS:443 (/api) → forward to server target group

Target Groups

client → HTTP:3000 with default health check of HTTP, /, Traffic Port, 5 healthy, 2 unhealthy, 5s timeout, 30s interval, 200 OK
server → HTTP:5000 with health check of HTTP, /api/health, Traffic Port, 5 healthy, 2 unhealthy, 5s timeout, 30s interval, 200 OK

Both docker images for client and server work properly locally & the client service seems to work well in AWS ECS. However, the server service keeps cycling between registering and de-registering (draining) the container instances seemingly without even becoming unhealthy

Here is what I see in the service Deployments and events tab:

5/12/2022, 8:43:04 PM   service server registered 2 targets in target-group <...>
5/12/2022, 8:42:54 PM   service server has started 2 tasks: task <...> task <...>.  <...>
5/12/2022, 8:42:51 PM   service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:51 PM   service server has begun draining connections on 1 tasks.   <...>
5/12/2022, 8:42:51 PM   service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:17 PM   service server registered 2 targets in target-group <...>
5/12/2022, 8:42:07 PM   service server has started 2 tasks: task <...> task <...>.  <...>
5/12/2022, 8:42:04 PM   service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:04 PM   service server has begun draining connections on 1 tasks.   <...>
5/12/2022, 8:42:04 PM   service server deregistered 1 targets in target-group <...>

Any ideas?

CodePudding user response：

Have you added your ALB SG as a source to the SG attached to your containerized application?

CodePudding user response：

After enabling AWS CloudWatch logs in my task definition's container specs, I was able to see that the issue was actually with an AWS RDS instance.

The RDS instances' SG was accepting traffic from an old cluster SG (which no longer exists), so that clears up why a health check wasn't being performed and the registered instances were draining immediately.