I was previously starting my ECS task on Fargate and it was running fine. Task definition had Network mode = awsvpc
, cluster was not associated with any capacity provider.
Now I'm trying to use EC2 "Launch type" (network mode is still awsvpc
, and target group type is IP
),
I created autoscaling group with launch configuration, using
ami-0da25582fb45be38c
(amzn2-ami-ecs-hvm-2.0.20220822-x86_64-ebs) and specific vpcID / security group / subnetsI created capacity provider in my ECS cluster, and associated it with autoscaling group that I created in step 1
I re-created ECS service and specified capacity provider that I created in step 2, as "Custom capacity provider strategy", with
weight=100, base=1
, and also specified vpcID / security group / subnets that I used in step 1Now I set
min=0, desired=1, max=1
in autoscaling group. I see that one EC2 instance successfully spins up and runs. I can SSH into it using PEM certificate and when I rundocker ps -a
, I can see thatamazon/amazon-ecs-agent:latest
container continuously starts, immediately exits, starts again after 15 seconds, exits, and so on. Not sure if this is expectedAnd finally now I set
min=0, desired=1, max=1
in my ECS service. I can see one task in the task list, but its state is stuck inPROVISIONING
and doesn't change. Correspondingly, no EC2 instance is allocated to it
Seems that ecs-agent is constantly restarting and that's why ECS service cannot use the EC2 instance for the task. Anyone has a clue why ecs-agent restarts?
UPD. Checked the docker logs for ecs-agent:
$ docker logs -f ecs-agent
level=info time=2022-09-06T05:20:25Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds_linux.go
level=info time=2022-09-06T05:20:25Z msg="Starting Amazon ECS Agent" commit="a1a5ecbc" version="1.62.2"
level=info time=2022-09-06T05:20:25Z msg="Loading configuration"
level=warn time=2022-09-06T05:20:25Z msg="Unable to fetch user data: EC2MetadataError: failed to make EC2Metadata request\n\tstatus code: 404, request id: \ncaused by: <?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\t \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n <title>404 - Not Found</title>\n </head>\n <body>\n <h1>404 - Not Found</h1>\n </body>\n</html>\n" module=config.go
level=info time=2022-09-06T05:20:25Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds_linux.go
level=info time=2022-09-06T05:20:25Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds_linux.go
level=info time=2022-09-06T05:20:25Z msg="Image excluded from cleanup" image="amazon/amazon-ecs-pause:0.1.0"
level=info time=2022-09-06T05:20:25Z msg="Image excluded from cleanup" image="amazon/amazon-ecs-pause:0.1.0"
level=info time=2022-09-06T05:20:25Z msg="Image excluded from cleanup" image="amazon/amazon-ecs-agent:latest"
level=info time=2022-09-06T05:20:25Z msg="Event stream ContainerChange start listening..." module=eventstream.go
level=info time=2022-09-06T05:20:25Z msg="Loading state!" module=state_manager.go
level=info time=2022-09-06T05:20:25Z msg="eni watcher has been initialized" module=watcher_linux.go
level=info time=2022-09-06T05:20:25Z msg="Registering Instance with ECS"
level=info time=2022-09-06T05:20:25Z msg="Remaining mem: 956" module=client.go
level=error time=2022-09-06T05:20:25Z msg="Unable to register as a container instance with ECS: ClientException: The referenced cluster was inactive." module=client.go
level=info time=2022-09-06T05:20:25Z msg="Remaining mem: 956" module=client.go
level=error time=2022-09-06T05:20:25Z msg="Unable to register as a container instance with ECS: ClientException: The referenced cluster was inactive." module=client.go
level=error time=2022-09-06T05:20:25Z msg="Error registering container instance" error="ClientException: The referenced cluster was inactive."
The role that I'm using for autoscaling group has the following ECS policies:
"ecs:CreateCluster",
"ecs:ListClusters",
"ecs:DeregisterContainerInstance",
"ecs:DiscoverPollEndpoint",
"ecs:Poll",
"ecs:RegisterContainerInstance",
"ecs:StartTelemetrySession",
"ecs:UpdateContainerInstancesState",
"ecs:Submit*",
Maybe the problem is in AMI and I can use another one? Although ecs-init
version is 1.62.2
so it's up-to-date
CodePudding user response:
Seems like the ECS cluster that the agent is trying to attach to doesn't exist. Check the value of ECS_CLUSTER
inside the file /etc/ecs/ecs.config
. If it's not set, a value default
is getting picked up and most probably you don't have a cluster named default
in your ECS.
Edit: You can either set it to the proper value in the AMI and spin up new instances in the cluster with the new AMI or you can update the user data
in the launch template to set the value when new instances are coming up.
More about the ECS container agent configurations here: https://github.com/aws/amazon-ecs-agent/blob/master/README.md