Tell me how to optimize the deletion of data from a Postgre table I have a table like this:
CREATE TABLE IF NOT EXISTS test (
group varchar(255),
id varchar(255),
type varchar(255),
);
INSERT INTO test
(group, id, type)
VALUES
('1', 'qw', 'START'),
('1', 'er', 'PROCESS'),
('1', 'ty', 'FINISH');
INSERT INTO test
(group, id, type)
VALUES
('2', 'as', 'START'),
('2', 'df', 'PROCESS'),
('2', 'fg', 'ERROR');
INSERT INTO test
(group, id, type)
VALUES
('3', 'zx', 'START'),
('3', 'cv', 'PROCESS'),
('3', 'ty', 'ERROR');
INSERT INTO test
(group, id, type)
VALUES
('4', 'df', 'START'),
('4', 'gh', 'PROCESS'),
('4', 'fg', 'ERROR'),
('4', 'ty', 'FINISH');
group | id | type |
---|---|---|
1 | qw | START |
1 | er | PROCESS |
1 | ty | FINISH |
2 | as | START |
2 | df | PROCESS |
2 | fg | ERROR |
3 | zx | START |
3 | cv | PROCESS |
3 | ty | ERROR |
4 | df | START |
4 | gh | PROCESS |
4 | fgv | ERROR |
4 | ty | FINISH |
It contains operations combined by one value in the GROUP field But not all operations reach the end and do not have an operation with the value FINISH in the list, but have type ERROR, like the rows with GROUP 2 and 3 This table is 1 terabyte I want to delete all chains of operations that did not end with the FINISH status, what is the best way to optimize this?
My code looks like this:
delete from TEST for_delete
where
for_delete.group in (
select group from TEST error
where
error.type='ERROR'
and
error.group NOT IN (select group from TEST where type='FINISH')
);
But for a plate with such a volume, I think it will be terribly slow, can I somehow improve my code?
CodePudding user response:
Very often EXISTS conditions are faster than IN condition. And NOT EXISTS
is almost always faster than NOT IN
, so you could try something like this:
delete from test t1
where exists (select *
from test t2
where t2."group" = t1."group"
and t2."type" = 'ERROR'
and not exists (select
from test t3
where t3."group" = t2."group"
and t3."type" = 'FINISH'));
CodePudding user response:
Typically, in case like this you should use a MV (Materialized View). You can create a table where save all the id that you need to delete and keeping in sync using triggers. For example:
CREARE TABLE IF NOT EXISTS test_MV (
id VARCHAR(255) PRIMARY KEY
);
You know the system and the data that you are using, you can also decide to keeping the table in sync using event. Using a MV, you can delete all the row using a easier and faster way:
delete from TEST for_delete
where
for_delete.id in (
select id from test_MV
);
Sorry for my bad English