I have a script which imports data from csv files and loads into db tables. The tables are partitioned by a column like customer_id. The script first loads data into a staging table, checks unique key/FK constraints, deletes data which violates the FK/unique constraints. Then it drops the existing partition from the main table and adds this staging table as a partition. My question is this the best approach to import data? The another approach I can think of is: don't use partitions, just use staging table and after cleaning the data, import it into the main table after deleting the existing data.
CodePudding user response:
If you can load data partitionwise, that is going to be better.
Deleting many rows in a PostgreSQL table is painful, and DROP TABLE
will always win.
CodePudding user response:
Keep in mind that in PostgreSQL, a DELETE
doesn't actually delete data from the table -- it merely flags the row as invisible to later transactions. Later, when the table is vacuumed, those invisible rows get flagged as re-usable for future INSERT
s and UPDATE
s. Only when a VACUUM FULL
on the table is run will the space be reclaimed. Therefore, "import[ing] it into the maint able after deleting the existing data" would cause bloat.
A DROP TABLE
will immediately reclaim space. As such, it would make more sense to work with partitions and drop partitions to reclaim space.
More information about this behavior can be found in the documentation