Why won't my custom Dockerfile connect over the docker-compose network when other services will-CodePudding

The problem

I am attempting to create a docker-compose file that will host three services. InfluxDB, Grafana, and a custom script in a customer Dockerfile that fills the database. I am having networking issues, and the custom script is not able to connect to the InfluxDB due to a connection refused error (shown below).

What is working so far

The interesting thing is, that when I remove the custom script service (called ads_agent) from my docker-compose file and either run that script from the localhost or even build and run that Dockerfile in its own container, it connects just fine.

What's the difference between the two

My script reads an environment variable called KTS_TELEMETRY_INFLUXDB_URL which is used for the InfluxDB client's URL to connect to. I can use "http://localhost:8086" for the URL when I run just from my command line, that works. I use my local machine's LAN IP address when I wrap the script in a Docker container because to it, localhost would be just the container. But nonetheless, this works just fine.

From within my docker-compose, since all three services are on the same network, I'm using "http://influxdb:8086" since that host name should be bound to that service's network interface. And indeed it is, because Grafana is connecting just fine using that URL. Sadly, when I try this with the script, I get connection refused.

The error

urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f18c1fec970>: Failed to establish a new connection: [Errno 111] Connection refused

My code

This is my docker-compose.yaml

version: "3"
services:
  influxdb:
    container_name: influxdb
    image: influxdb:2.0.9-alpine # influxdb:latest
    networks:
      - telemetry_network
    ports:
      - 8086:8086
    volumes:
      - influxdb-storage:/var/lib/influxdb2
    restart: always
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=$KTS_TELEMETRY_INFLUXDB_USERNAME
      - DOCKER_INFLUXDB_INIT_PASSWORD=$KTS_TELEMETRY_INFLUXDB_PASSWORD
      - DOCKER_INFLUXDB_INIT_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
      - DOCKER_INFLUXDB_INIT_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET
      - DOCKER_INFLUXDB_INIT_RETENTION=$KTS_TELEMETRY_INFLUXDB_RETENTION
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
  grafana:
    container_name: grafana
    image: grafana/grafana:8.1.7 # grafana/grafana:latest
    networks:
      - telemetry_network
    ports:
      - 3000:3000
    volumes:
      - grafana-storage:/var/lib/grafana
    restart: always
    depends_on:
      - influxdb
  ads_agent:
    container_name: ads_agent
    build: ./ads_agent
    networks:
      - telemetry_network
    restart: always
    depends_on:
      - influxdb
    environment:
      - KTS_TELEMETRY_INFLUXDB_URL=http://influxdb:8086
      - KTS_TELEMETRY_INFLUXDB_TOKEN=$KTS_TELEMETRY_INFLUXDB_TOKEN
      - KTS_TELEMETRY_INFLUXDB_ORG=$KTS_TELEMETRY_INFLUXDB_ORG
      - KTS_TELEMETRY_INFLUXDB_BUCKET=$KTS_TELEMETRY_INFLUXDB_BUCKET

networks:
  telemetry_network:

volumes:
  influxdb-storage:
  grafana-storage:

This is my ads_agent/Dockerfile

FROM python:3.9
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r /requirements.txt
COPY main.py .
ENTRYPOINT /usr/local/bin/python3 /main.py

ads_agent/requirements.txt just has the influxdb-client and this is my ads/main.py

import os
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
from datetime import datetime
import random
import time

token = os.environ["KTS_TELEMETRY_INFLUXDB_TOKEN"]
org = os.environ["KTS_TELEMETRY_INFLUXDB_ORG"]
bucket = os.environ["KTS_TELEMETRY_INFLUXDB_BUCKET"]
url = os.environ["KTS_TELEMETRY_INFLUXDB_URL"]

client = InfluxDBClient(url=url, token=token)
dbh = client.write_api(write_options=SYNCHRONOUS)

while True:
    symbol_name = 'rand_num'
    value = random.random()
    timestamp = datetime.utcnow()
    print(timestamp, symbol_name, value)
    point = Point("mem") \
        .field(symbol_name, value) \
        .time(timestamp, WritePrecision.NS)
    dbh.write(bucket, org, point)
    time.sleep(1)

CodePudding user response：

Your problem not related to network connectivity, just related to startup order. Although you define depends_on - influxdb for ads_agent, still will have chance that when your script try to connect the influxdb, the influx db still not finish up.

This is why you can success if you do it manually, as there is time delay there for your manual operation, at that time, the db already ready.

Reason see this:

depends_on does not wait for db and redis to be “ready” before starting web - only until they have been started. If you need to wait for a service to be ready. )

To assure your db really up before your script starts, you need to refers to Control startup and shutdown order in Compose:

To handle this, design your application to attempt to re-establish a connection to the database after a failure. If the application retries the connection, it can eventually connect to the database.

The best solution is to perform this check in your application code, both at startup and whenever a connection is lost for any reason. However, if you don’t need this level of resilience, you can work around the problem with a wrapper script:
Use a tool such as wait-for-it, dockerize, sh-compatible wait-for, or RelayAndContainers template. These are small wrapper scripts which you can include in your application’s image to poll a given host and port until it’s accepting TCP connections. For example, to use wait-for-it.sh or wait-for to wrap your service’s command:
version: "2"
services:
  web:
    build: .
    ports:
      - "80:8000"
    depends_on:
      - "db"
    command: ["./wait-for-it.sh", "db:5432", "--", "python", "app.py"]
  db:
    image: postgres
Alternatively, write your own wrapper script to perform a more application-specific health check.