This is my scriptlet for triggerring the python script.
task1= BashOperator(
task_id='task1_monthly_run',
bash_command="python /opt/airflow/dags/scripts/validation/runner.py arg1 arg2",
dag=dag)
I give 2 args for using in py script. Because I use same script for different args.
this is from runner.py to show the usage of args:
import sys
arg_list = sys.argv[1:]
model_name = arg_list[0]
val_date_son = arg_list[1]
Code works fine. args pass fine. But Airflow UI shows error about args:
Broken DAG: [/opt/airflow/dags/scripts/validation/runner.py] Traceback (most recent call last): File "", line 219, in _call_with_frames_removed File "/opt/airflow/dags/scripts/validation/runner.py", line 10, in val_date_son = arg_list1 IndexError: list index out of range
How can i get rid of this error?
CodePudding user response:
Airflow discover DAGs by looking over all .py
files in the DAG directory.
Furthermore, Airflow has optimization to consider only .py
that contains the words dag
and airflow
(dag_discovery_safe_mode). In your case runner.py
probably contains the strings dag
and airflow
as a result Airflow try to parse the file and look for DAGs but since this is a script Airflow encounter an error and raises Broken DAG message.
The reason you are able to execute the DAG successfully though there is a Broken DAG message is because the message is not on the DAG - it's on the runner.py
. Executing it as you meant (via BashOperator) works fine but trying to parse the script as DAG file results in error.
To solve your issue you need to set .airflowignore
this will tell Airflow not to parse .py
files under /dags/scripts/
as under scripts folder you should not store dag files.
Alternatively you can replace occurrences of dag
and airflow
strings in runner.py - it will not show the Broken DAG message but Airflow will still try to parse the file so the .airflowignore
solution is prefered.