Home > Net >  How to add new user to docker image when running distributed airflow architecture using docker-compo
How to add new user to docker image when running distributed airflow architecture using docker-compo

Time:09-26

So the main issue is to run container based processing when the airflow celery worker is running inside a docker container. To solve this, the user running airflow(inside the container) must belong to the docker group at the host and /var/run/docker.sock needs to be mounted to the airflow container.

I'm launching the whole setup from a docker-compose.yml where I set AIRFLOW_UID=1234 and AIRFLOW_GID=0. I'm using a docker image based on the official airflow image with the addition that I have created 'newuser' with gid=1234 and 'docker' group with gid that matches the one at the host.

However, when I launch the setup, the user I created in the image build phase is not there and I can't understand why. There is a 'default' user with uid=1234 and gid=0. This default user is created if I use the official Image and just define AIRFLOW_UID in the docker-compose.yml.

Dockerfile:

FROM apache/airflow:2.1.0

USER root
RUN useradd newuser -u 1234 -g 0

RUN groupadd --gid 986 docker \
    && usermod -aG docker newuser
USER newuser

Also, if I don't create the newuser and just add airflow user to docker group then the airflow user is really added to the docker group as it should.

Does docker-compose overwrite the users created at the image build phase? What would be the best way to solve this issue?

CodePudding user response:

I'm launching the whole setup from a docker-compose.yml where I set AIRFLOW_UID=1234 and AIRFLOW_GID=0. I'm using a docker image based on the official airflow image with the addition that I have created 'newuser' with gid=1234 and 'docker' group with gid that matches the one at the host.

You should not do it at all. The user will be created automatically by Airflow's image entrypoint when you use a differnt UID than default - see https://airflow.apache.org/docs/docker-stack/entrypoint.html#allowing-arbitrary-user-to-run-the-container. In fact all that you want to do should be possible without having to extend the Airflow image.

What you need to do, you need to create this user that you want to run inside the container ON THE HOST - not in the container. And it should belong to the docker group ON THE HOST - not in the container.

Docker works in the way that it uses the same kernel/users that are defined in the system, so when you run something as a user in the container, it is run with the "host" user priviledges, so you you map your docker socket to within the container, it will be able to use the socket/run docker command becaue it will have the right permissions on the host.

Therefore (in case you run your docker-compose as regular user who already belongs to docker group) the best way is the one suggested in the quick-start - i.e. run airflow with your "host" user that you are logged in with: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html

This also makes all the files created in container belong to the "logged in user" (if they are created in directories mounted inside - such as logs directory).

But if your goal is to use it in "unattended" environment, then likely creating the new user on your host and adding the user to both 0 and docker groups should solve the problem.

CodePudding user response:

As a complement to the great answer from @JarekPotiuk, if as indicated in your comments the problem is related to permissions issues when using the DockerOperator, you can try the following approach.

The idea is including in the airflow docker-compose.yml file a service based on the bobrik/socat image. Something like:

docker-proxy:
  image: bobrik/socat
  command: "TCP4-LISTEN:2375,fork,reuseaddr UNIX-CONNECT:/var/run/docker.sock"
  ports:
    - 2375:2375
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock
  restart: always

This will effectively create a bridge with you host docker daemon and would allow you to run your containers using the DockerOperator without permissions issues by providing an appropriate value for the docker_url argument:

docker_based_task = DockerOperator(
    task_id="a_docker_based_one",
    docker_url="tcp://docker-proxy:2375"
    # ...
)  
  • Related