Is there a way to find the maximum/minimum or even an average duration of all DagRun instances in Airflow? - That is all dagruns from all dags not just one single dag.
I can't find anywhere to do this on the UI or even a page with a programmatic/command line example.
CodePudding user response:
You can use airflow- api to get all dag_runs for dag and calculate statistics.
An example to get all dag_runs per dag and calc total time :
import datetime
import requests
from requests.auth import HTTPBasicAuth
airflow_server = "http://localhost:8080/api/v1/"
auth = HTTPBasicAuth("airflow", "airflow")
get_dags_url = f"{airflow_server}dags"
get_dag_params = {
"limit": 100,
"only_active": "true"
}
response = requests.get(get_dags_url, params=get_dag_params, auth=auth)
dags = response.json()["dags"]
get_dag_run_params = {
"limit": 100,
}
for dag in dags:
dag_id = dag["dag_id"]
dag_run_url = f"{airflow_server}/dags/{dag_id}/dagRuns?limit=100&state=success"
response = requests.get(dag_run_url, auth=auth)
dag_runs = response.json()["dag_runs"]
for dag_run in dag_runs:
start_date = datetime.datetime.fromisoformat(dag_run['start_date'])
end_date = datetime.datetime.fromisoformat(dag_run['start_date'])
total = end_date - start_date