I'm trying to install a gitlab-runner on EC2. The executor that I want is Docker.
My config.toml is
concurrent = 10
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "My Docker Runner"
url = "https://gitlab.com/"
token = "SECRET"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "docker:19.03.12"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/certs/client", "/cache"]
shm_size = 0
My .gitlab-ci.yml is
image: docker:19.03.12
variables:
# When you use the dind service, you must instruct Docker to talk with
# the daemon started inside of the service. The daemon is available
# with a network connection instead of the default
# /var/run/docker.sock socket. Docker 19.03 does this automatically
# by setting the DOCKER_HOST in
# https://github.com/docker-library/docker/blob/d45051476babc297257df490d22cbd806f1b11e4/19.03/docker-entrypoint.sh#L23-L29
#
# The 'docker' hostname is the alias of the service container as described at
# https://docs.gitlab.com/ee/ci/docker/using_docker_images.html#accessing-the-services.
#
# Specify to Docker where to create the certificates. Docker
# creates them automatically on boot, and creates
# `/certs/client` to share between the service and job
# container, thanks to volume mount from config.toml
DOCKER_TLS_CERTDIR: "/certs"
services:
- docker:19.03.12-dind
before_script:
- docker info
build:
stage: build
script:
- docker build -t my-docker-image .
- docker run my-docker-image /script/to/run/tests
I always have this error:
Running with gitlab-runner 14.4.0 (4b9e985a)
on My Docker Runner u9_6MpHg
Resolving secrets
00:00
Preparing the "docker" executor
Using Docker executor with image docker:19.03.12 ...
Starting service docker:19.03.12-dind ...
Authenticating with credentials from $DOCKER_AUTH_CONFIG
Pulling docker image docker:19.03.12-dind ...
Using docker image sha256:66dc2d45749a48592f4348fb3d567bdd65c9dbd5402a413b6d169619e32f6bd2 for docker:19.03.12-dind with digest docker@sha256:674f1f40ff7c8ac14f5d8b6b28d8fb1f182647ff75304d018003f1e21a0d8771 ...
Waiting for services to be up and running...
Authenticating with credentials from $DOCKER_AUTH_CONFIG
Pulling docker image docker:19.03.12 ...
Using docker image sha256:81f5749c9058a7284e6acd8e126f2b882765a17b9ead14422b51cde1a110b85c for docker:19.03.12 with digest docker@sha256:d41efe7ad0df5a709cfd4e627c7e45104f39bbc08b1b40d7fb718c562b3ce135 ...
Preparing environment
00:00
Running on runner-u96mphg-project-31310309-concurrent-0 via ip-10-120-65-72.ec2.internal...
Getting source from Git repository
00:02
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/mediagrif/itt/network/poc/test-private-runner/.git/
Created fresh repository.
Checking out 3d7fe999 as main...
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:01
Using docker image sha256:81f5749c9058a7284e6acd8e126f2b882765a17b9ead14422b51cde1a110b85c for docker:19.03.12 with digest docker@sha256:d41efe7ad0df5a709cfd4e627c7e45104f39bbc08b1b40d7fb718c562b3ce135 ...
$ docker info
Client:
Debug Mode: false
Server:
**ERROR: Cannot connect to the Docker daemon at tcp://localhost:2375. Is the docker daemon running?**
errors pretty printing info
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 1
I tried all day long and follow all the instructions on the gitlab documentation and nothing works. I'm always getting the same error. I tried with shell executor, docker and docker machine executor and I have the same error.
I tried to use DinD, direct Socket and Shell executor to build my Docker image.
I tried to specify DOCKER_HOST, service alias, disabling certificate.
What I found strange is that even if I change the DOCKER_HOST in my gitlab-ci, when I look at /etc/hosts, I see the record for the service, but the error message is always pointing on localhost.
I tried to use version 13.11.0 and 14.4.0 of Gitlab Runner. I tried to install the runner it with YUM. I also tried to run it with Docker run. I also tried in my gitlab-ci file to use Docker 19 and Docker 20.
Nothing works.
Does somebody have an hint for me please?
Thanks
Yann
CodePudding user response:
There are several things that may be going on here, but it sounds like you've tried the basic DOCKER_HOST stuff. Generally, DinD will set the host to what's necessary, so there is some issue with DinD connecting to your docker daemon on the host. Here are a couple things to try:
- SSH into your GitLab runner, and run
docker ps
to ensure that the socket is running properly. It's possible that the socket is not set to run on startup. - When you're connected to your box via SSH, ensure that you can access docker without the use of
sudo
. If your gitlab-runner user needs to usesudo
to access docker, you will get errors. - Start a DinD container on your runner box, passing in the privileged flag, and attempt to access docker from within the DinD container.
Odds are good that the error is how docker is configured on the host - nothing looks wrong with your runner toml or your CI yml.
CodePudding user response:
Two things:
When using docker:dind
service, the hostname of the docker daemon is docker
not localhost
. The GitLab docs kind of contradict themselves here.
While the docker:19.03.12
image does set the docker host correctly, in some cases, you do sometimes need to specify DOCKER_HOST
for the benefit of the dind container itself, which has a totally different entrypoint that can result in the docker host being set as tcp://localhost:2375
which won't work when TLS is enabled. Or if
Also, when specifying DOCKER_TLS_CERTDIR
, TLS is enabled by default and the TLS-enabled listening port is 2376
not 2375
.
To correct this, make either of the following configuration changes:
variables:
DOCKER_TLS_CERTDIR: "/certs"
DOCKER_HOST: "tcp://docker:2376" # dind with TLS enabled
OR
varaibles:
DOCKER_TLS_CERTDIR: ""
DOCKER_HOST: "tcp://docker:2375" # dind with TLS disabled
You should also double check that your certificate directory actually contains the proper certificates or that the mount point is writable, otherwise if the certs are missing, dind treats it as if TLS is disabled.
If your job seems to use localhost:2375
despite your environment variable, it must be because this variable is being overridden somewhere, like being set at the project or group level CI/CD settings, which would override your YAML configuration.
You can confirm this in your job script by echoing the value:
script:
- echo $DOCKER_HOST
- docker info