FROM apache/airflow:2.2.4
# install mongodb-org-tools - mongodb tools for up-to-date mongodb that can handle --uri=mongodb srv: flag
RUN apt-get update && apt-get install -y gnupg software-properties-common && \
curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
apt-get update && \
apt-get install -y mongodb-org-tools
ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
We need to be able to use mongoDB CLI commands such as mongoimport
, mongoexport
in BashOperator
in our airflow project, as our workflow involves moving data into a MongoDB database. We have a strong preference for using mongo commands like mongoimport
over the python pymongo
package.
When we build the image, it seems we do not have permission to install mongo - we receive the following error:
=> ERROR [cbb-airflow_airflow-webserver 2/4] RUN apt-get update && apt-get install -y gnupg software-properties-common && curl -fsSL https://www. 0.6s
------
> [cbb-airflow_airflow-webserver 2/4] RUN apt-get update && apt-get install -y gnupg software-properties-common && curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && apt-get update && apt-get install -y mongodb-org-tools:
#0 0.460 Reading package lists...
#0 0.592 E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)
------
failed to solve: executor failed running [/bin/bash -o pipefail -o errexit -o nounset -o nolog -c apt-get update && apt-get install -y gnupg software-properties-common && curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && apt-get update && apt-get install -y mongodb-org-tools]: exit code: 100
What is the best way to install mongo CLI for commands like mongoimport
using the official apache/airflow docker image?
CodePudding user response:
Add USER root
after the FROM
statement.
Updated Dockerfile will look like this:
FROM apache/airflow:2.2.4
USER root
# install mongodb-org-tools - mongodb tools for up-to-date mongodb that can handle --uri=mongodb srv: flag
RUN apt-get update && apt-get install -y gnupg software-properties-common && \
curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
apt-get update && \
apt-get install -y mongodb-org-tools
ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
TL;DR
The user is set to airflow
(id 5000) in the apache/airflow:2.2.4
Docker image. We can confirm this by looking at the 49th instruction in the Dockerfile here.
Now when you try to run any command, it will run using the airflow
user which has restricted access.
To overcome this problem, you need to explicitly switch to the root
user while building the Docker image. This will resolve all the permission-related issues.