The problem
I am attempting to create a docker-compose file that will host three services. InfluxDB, Grafana, and a custom script in a customer Dockerfile that fills the database. I am having networking issues, and the custom script is not able to connect to the InfluxDB due to a connection refused error (shown below).
What is working so far
The interesting thing is, that when I remove the custom script service (called ads_agent) from my docker-compose file and either run that script from the localhost or even build and run that Dockerfile in its own container, it connects just fine.
What's the difference between the two
My script reads an environment variable called KTS_TELEMETRY_INFLUXDB_URL which is used for the InfluxDB client's URL to connect to. I can use "http://localhost:8086" for the URL when I run just from my command line, that works. I use my local machine's LAN IP address when I wrap the script in a Docker container because to it, localhost would be just the container. But nonetheless, this works just fine.
From within my docker-compose, since all three services are on the same network, I'm using "http://influxdb:8086" since that host name should be bound to that service's network interface. And indeed it is, because Grafana is connecting just fine using that URL. Sadly, when I try this with the script, I get connection refused.
The error
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f18c1fec970>: Failed to establish a new connection: [Errno 111] Connection refused
My code
This is my docker-compose.yaml
version: "3"
services:
influxdb:
container_name: influxdb
image: influxdb:2.0.9-alpine # influxdb:latest
networks:
- telemetry_network
ports:
- 8086:8086
volumes:
- influxdb-storage:/var/lib/influxdb2
restart: always
environment:
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_USERNAME=$KTS_TELEMETRY_INFLUXDB_USERNAME
- DOCKER_INFLUXDB_INIT_PASSWORD=$KTS_TELEMETRY_INFLUXDB_PASSWORD
- DOCKER_INFLUXDB_INIT_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
- DOCKER_INFLUXDB_INIT_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET
- DOCKER_INFLUXDB_INIT_RETENTION=$KTS_TELEMETRY_INFLUXDB_RETENTION
- DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
grafana:
container_name: grafana
image: grafana/grafana:8.1.7 # grafana/grafana:latest
networks:
- telemetry_network
ports:
- 3000:3000
volumes:
- grafana-storage:/var/lib/grafana
restart: always
depends_on:
- influxdb
ads_agent:
container_name: ads_agent
build: ./ads_agent
networks:
- telemetry_network
restart: always
depends_on:
- influxdb
environment:
- KTS_TELEMETRY_INFLUXDB_URL=http://influxdb:8086
- KTS_TELEMETRY_INFLUXDB_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
- KTS_TELEMETRY_INFLUXDB_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
- KTS_TELEMETRY_INFLUXDB_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET
networks:
telemetry_network:
volumes:
influxdb-storage:
grafana-storage:
This is my ads_agent/Dockerfile
FROM python:3.9
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r /requirements.txt
COPY main.py .
ENTRYPOINT /usr/local/bin/python3 /main.py
ads_agent/requirements.txt just has the influxdb-client and this is my ads/main.py
import os
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
from datetime import datetime
import random
import time
token = os.environ["KTS_TELEMETRY_INFLUXDB_TOKEN"]
org = os.environ["KTS_TELEMETRY_INFLUXDB_ORG"]
bucket = os.environ["KTS_TELEMETRY_INFLUXDB_BUCKET"]
url = os.environ["KTS_TELEMETRY_INFLUXDB_URL"]
client = InfluxDBClient(url=url, token=token)
dbh = client.write_api(write_options=SYNCHRONOUS)
while True:
symbol_name = 'rand_num'
value = random.random()
timestamp = datetime.utcnow()
print(timestamp, symbol_name, value)
point = Point("mem") \
.field(symbol_name, value) \
.time(timestamp, WritePrecision.NS)
dbh.write(bucket, org, point)
time.sleep(1)
CodePudding user response:
Your problem not related to network connectivity
, just related to startup order
. Although you define depends_on - influxdb
for ads_agent
, still will have chance
that when your script try to connect the influxdb, the influx db still not finish up.
This is why you can success if you do it manually, as there is time delay there for your manual operation, at that time, the db already ready.
Reason see this:
depends_on
does not wait for db and redis to be “ready” before starting web - only until they have been started. If you need to wait for a service to be ready. )
To assure your db really up before your script starts, you need to refers to Control startup and shutdown order in Compose:
To handle this, design your application to attempt to re-establish a connection to the database after a failure. If the application retries the connection, it can eventually connect to the database.
The best solution is to perform this check in your application code, both at startup and whenever a connection is lost for any reason. However, if you don’t need this level of resilience, you can work around the problem with a wrapper script:
Use a tool such as wait-for-it, dockerize, sh-compatible wait-for, or RelayAndContainers template. These are small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections. For example, to use wait-for-it.sh or wait-for to wrap your service’s command:
version: "2" services: web: build: . ports: - "80:8000" depends_on: - "db" command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"] db: image: postgres
Alternatively, write your own wrapper script to perform a more application-specific health check.