Goal
I'm trying to run a simple DAG which creates a pandas DataFrame and writes to a file. The DAG is being run in a Docker container with Airflow, and the file is being written to a named volume.
Problem
When I start the container, I get the error:
Broken DAG: [/usr/local/airflow/dags/simple_datatest.py] [Errno 13] Permission denied: '/usr/local/airflow/data/local_data_input.csv'
Question
Why am I getting this error? And how can I fix this so that it writes properly?
Context
I am loosely following a tutorial here, but I've modified the DAG. I'm using the puckel/docker-airflow
image from Docker Hub. I've attached a volume pointing to the appropriate DAG, and I've created another volume to contain the data written within the DAG (created by running docker volume create airflow-data
).
The run command is:
docker run -d -p 8080:8080 \
-v /path/to/local/airflow/dags:/usr/local/airflow/dags \
-v airflow-data:/usr/local/airflow/data:Z \
puckel/docker-airflow \
webserver
The DAG located at the /usr/local/airflow/dags
path on the container is defined as follows:
import airflow
from airflow import DAG
from airflow.operators import BashOperator
from datetime import datetime, timedelta
import pandas as pd
# Following are defaults which can be overridden later on
default_args = {
'owner': 'me',
'depends_on_past': False,
'start_date': datetime(2021, 12, 31),
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=1),
}
dag = DAG('datafile', default_args=default_args)
def task_make_local_dataset():
print("task_make_local_dataset")
local_data_create=pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
local_data_create.to_csv('/usr/local/airflow/data/local_data_input.csv')
t1 = BashOperator(
task_id='write_local_dataset',
python_callable=task_make_local_dataset(),
bash_command='python3 ~/airflow/dags/datatest.py',
dag=dag)
The error in the DAG appears to be in the line
local_data_create.to_csv('/usr/local/airflow/data/local_data_input.csv')
I don't have permission to write to this location.
Attempts
I've tried changing the location of the data
directory on the container, but airflow can't access it. Do I have to change permissions? It seems that this is a really simple thing that most people would want to be able to do: write to a container. I'm guessing I'm just missing something.
CodePudding user response:
Don't use Puckel Docker Image. It's not been maintained for years, Airflow 1.10 has reached End Of Life in June 2021. You should only look at Airflow 2 and Airflow has official reference image that you can use:
Airflow 2 has also Quick-Start guides you can use - based on the image and docker compose: https://airflow.apache.org/docs/apache-airflow/stable/start/index.html
And it also has Helm Chart that can be used to productionize your setup. https://airflow.apache.org/docs/helm-chart/stable/index.html
Don't waste your (and other's) time on Puckel and Airflow 1.10.