Home > Software engineering >  Airflow on Docker: Can't Write to Volume (Permission Denied)
Airflow on Docker: Can't Write to Volume (Permission Denied)

Time:01-03

Goal

I'm trying to run a simple DAG which creates a pandas DataFrame and writes to a file. The DAG is being run in a Docker container with Airflow, and the file is being written to a named volume.

Problem

When I start the container, I get the error:

Broken DAG: [/usr/local/airflow/dags/simple_datatest.py] [Errno 13] Permission denied: '/usr/local/airflow/data/local_data_input.csv'

Question

Why am I getting this error? And how can I fix this so that it writes properly?

Context

I am loosely following a tutorial here, but I've modified the DAG. I'm using the puckel/docker-airflow image from Docker Hub. I've attached a volume pointing to the appropriate DAG, and I've created another volume to contain the data written within the DAG (created by running docker volume create airflow-data).

The run command is:

docker run -d -p 8080:8080 \
-v /path/to/local/airflow/dags:/usr/local/airflow/dags \
-v airflow-data:/usr/local/airflow/data:Z \
puckel/docker-airflow \
webserver

The DAG located at the /usr/local/airflow/dags path on the container is defined as follows:

import airflow
from airflow import DAG
from airflow.operators import BashOperator
from datetime import datetime, timedelta
import pandas as pd

# Following are defaults which can be overridden later on
default_args = {
    'owner': 'me',
    'depends_on_past': False,
    'start_date': datetime(2021, 12, 31),
    'email': ['[email protected]'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

dag = DAG('datafile', default_args=default_args)

def task_make_local_dataset():
  print("task_make_local_dataset")
  local_data_create=pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
  local_data_create.to_csv('/usr/local/airflow/data/local_data_input.csv')

t1 = BashOperator(
    task_id='write_local_dataset',
    python_callable=task_make_local_dataset(),
    bash_command='python3 ~/airflow/dags/datatest.py',
    dag=dag)

The error in the DAG appears to be in the line

local_data_create.to_csv('/usr/local/airflow/data/local_data_input.csv')

I don't have permission to write to this location.

Attempts

I've tried changing the location of the data directory on the container, but airflow can't access it. Do I have to change permissions? It seems that this is a really simple thing that most people would want to be able to do: write to a container. I'm guessing I'm just missing something.

CodePudding user response:

Don't use Puckel Docker Image. It's not been maintained for years, Airflow 1.10 has reached End Of Life in June 2021. You should only look at Airflow 2 and Airflow has official reference image that you can use:

Airflow 2 has also Quick-Start guides you can use - based on the image and docker compose: https://airflow.apache.org/docs/apache-airflow/stable/start/index.html

And it also has Helm Chart that can be used to productionize your setup. https://airflow.apache.org/docs/helm-chart/stable/index.html

Don't waste your (and other's) time on Puckel and Airflow 1.10.

  • Related