I'm so close to finding a nice setup with Docker Compose and ECS, but there is one small thing remaining.
The scenario goes like this:
- Update app (Django) source code and deploy to ECS using Docker Compose and Docker Context.
- ECS registers a new task for the app and starts it along with the old one.
- Problem: Nginx does health checks on the old container and when that is deregistered, nginx starts throwing 502 errors and restarts the task, leading to downtime and unavailability.
- Nginx starts up again and does health checks on the new container, app working again, but with undesired downtime as mentioned.
Is there some config I need to do here? Am I missing something?
docker-compose.yml for reference:
services:
web:
image: # Image from ECR, built from GH-action.
command: gunicorn core.wsgi:application --bind 0.0.0.0:8000
environment:
# ...
volumes:
# ...
deploy:
replicas: 1
nginx:
image: # Image from ECR, kept static
ports:
- "80:80"
volumes:
# ...
depends_on:
- web
deploy:
replicas: 1
CodePudding user response:
So it turns out this was quite the challenge. The bottom line is that an ECS task cannot be updated while it's running. So we need to restart the task or use the execute-command
to manually update it.
I tried the jwilder/nginx-proxy
approach, but this was not possible with Fargate because of the way volume mounting works with that launch type.
I ended up using the sidecar pattern for my nginx container, however, there is currently no solution available for sidecars with the Compose-CLI (see https://github.com/docker/compose-cli/issues/1566), so I had to use the x-aws-cloudformation
overlay in a slightly messy way.
So first we just remove the nginx service:
version: "3.9"
services:
web:
image: # django-app
command: gunicorn core.wsgi:application --bind 0.0.0.0:8000
environment:
# ...
volumes:
# ...
ports:
- "80:80" # Move ports into this service so we get the ALB
deploy:
replicas: 1
Run convert command to get the generated CloudFormation template:
docker compose covert > cfn.yml
Then add the x-aws-cloudformation
overlay:
x-aws-cloudformation:
Resources:
WebTaskDefinition:
Properties:
ContainerDefinitions:
# Generated container definitions, copy/paste from cfn.yml
# Only change ContainerPort for web:
- # ...Web_ResolvConf_InitContainer
- # ...Web-container
PortMappings:
- ContainerPort: 8000
# The nginx sidecar:
- DependsOn:
- Condition: SUCCESS
ContainerName: Web_ResolvConf_InitContainer
- Condition: START
ContainerName: web
Essential: true
Image: # nginx
LogConfiguration:
# ...
MountPoints:
# ...
Name: nginx
PortMappings:
- ContainerPort: 80
# We also need to tell the load-balancer to reference the nginx container
WebService:
Properties:
LoadBalancers:
- ContainerName: nginx
ContainerPort: 80
TargetGroupArn:
Ref: WebTCP80TargetGroup
Finally, we need to change the nginx config a bit
# BEFORE
upstream app_server {
server web:8000 fail_timeout=0;
}
# AFTER
upstream app_server {
server 0.0.0.0:8000 fail_timeout=0;
}
Not pretty, but it works. Rolling updates will now have zero downtime as expected. Hopefully, this pattern will be improved with the evolution of the Compose-CLI!