Home > Net >  How is airflow database managed periodically?
How is airflow database managed periodically?

Time:01-18

I am running airflow using postgres.

There was a phenomenon that the web server was slow during operation.

It was a problem caused by data continuing to accumulate in dag_run and log of the db table (it became faster by accessing postgres and deleting data directly).

Are there any airflow options to clean the db periodically?

If there is no such option, we will try to delete the data directly using the dag script.

And I think it's strange that the web server slows down because there is a lot of data. Does the web server get all the data when opening another window?

CodePudding user response:

Airflow database is managed daily by creating backups and carrying out routine maintenance tasks. Problems with the database, such as data corruption and poor performance, may be avoided with this assistance. The slow web server could be caused by a number of things, like a lack of optimization, inefficient code, or insufficient resources. In order to find a solution to the problem, it is essential to determine its underlying cause.

Are there any airflow options to clean the db periodically?

Yes, Airflow provides options for periodically cleaning the database. In Airflow, you can set up a maintenance workflow that can remove entries from DagRun, TaskInstance, Log, XCom, Job DB, and SlaMiss to prevent data loss. In addition, the database can be cleared out using Airflow 2.3's db downgrade command.Before cleaning up, backup the existing data for your reference or else you can also replicate this master db data simultaneously to slave db using the replication process.

Does the web server get all the data when opening another window?

Yes, the web server will receive all of the data if you open another database window in Airflow. You can view all of the data in the database and make any changes to it using the web server. However, until you commit your changes, they will not appear in the database.

CodePudding user response:

You can purge old records by running:

airflow db clean [-h] --clean-before-timestamp CLEAN_BEFORE_TIMESTAMP [--dry-run] [--skip-archive] [-t TABLES] [-v] [-y]

(cli reference)

It is a quite common setup to include this command in a DAG that runs periodically.

  • Related