We have an application going on for more than four years and a lot of data was being accumulated during this period. With the push for new features all this time, implementing archiving and clean up procedures where pushed back again and again.
Deleting records with DELETE
statements was no longer an option as it would have taken forever. So we created a script that was acting in a more aggressive way by creating a temporary table with the records we wanted to keep (last month of data) then dropping the original table then renaming the temporary table to the right name and recreating the constrains and indexes, etc
CREATE TABLE fred_to_keep AS SELECT * FROM fred WHILE last_update < current_date - 31 * INTERVAL '1 day';
DROP TABLE fred;
ALTER TABLE fred_to_keep RENAME TO fred;
-- create indexes and constraints here, change the owner etc
Given the aggressive nature of our approach this was tested lots of times on databases with a similar data size and repartition with our production database and the overall time taken was about 2 and 1/2 hours. From the script logs one of the tables was taken one hour to clean up.
Then we decided to run it in production. Everything was going fine until it hit the table fred
and after three hours the script was still not finished and we had to take the decision to kill the script and restore the database from the snapshot taken just before started. After waiting another half an hour we had no choice but to do the rollback.
Now coming from an Oracle background I know it is possible by interrogating dba_segments
to see the table size increasing while it is being created. Trying to google for a similar thing in PostreSQL did not take me anywhere and I am wondering is there any way to actually see the progress of CREATE TABLE AS SELECT * FROM fred WHERE ...
while this is being executed? Being able to see the space taken by the table being created and comparing it against expected size would have helped us to take a much more informed decision about killing the script or leaving it run. It is a bad feeling to think we killed it maybe just one minute before finishing.
CodePudding user response:
Example of using pg_relation_size()
to monitor table size.
create table dog(id integer);
select pg_relation_size('dog');
pg_relation_size
------------------
0
insert into dog values (1);
select pg_relation_size('dog');
pg_relation_size
------------------
8192
insert into dog select * from generate_series(1, 100000);
select pg_relation_size('dog');
pg_relation_size
------------------
3629056
If you do this in a explicit transaction e.g. Begin; <DDL, DML statements> Commit;
then other sessions will not be able to see/monitor until the Commit
is done.
CodePudding user response:
If it is all done in one transaction, there is no good way to do this in postgresql from SQL, because until the creation of the table is committed, no one else can get there eyes on it. You can use file system tools to watch the file grow, but you first need to infer which file is the right one. If you don't have access to the file system, I'm afraid you are out of luck.