Optimize delete from big size table-CodePudding

Tell me how to optimize the deletion of data from a Postgre table I have a table like this:

CREATE TABLE IF NOT EXISTS test (
    group varchar(255),
    id varchar(255),
    type varchar(255),
);

INSERT INTO test 
    (group, id, type) 
VALUES
    ('1', 'qw', 'START'),
    ('1', 'er', 'PROCESS'),
    ('1', 'ty', 'FINISH');

INSERT INTO test 
    (group, id, type) 
VALUES
    ('2', 'as', 'START'),
    ('2', 'df', 'PROCESS'),
    ('2', 'fg', 'ERROR');

INSERT INTO test 
    (group, id, type)   
VALUES
    ('3', 'zx', 'START'),
    ('3', 'cv', 'PROCESS'),
    ('3', 'ty', 'ERROR');

INSERT INTO test 
    (group, id, type)   
VALUES
    ('4', 'df', 'START'),
    ('4', 'gh', 'PROCESS'),
    ('4', 'fg', 'ERROR'),
    ('4', 'ty', 'FINISH');

group	id	type
1	qw	START
1	er	PROCESS
1	ty	FINISH
2	as	START
2	df	PROCESS
2	fg	ERROR
3	zx	START
3	cv	PROCESS
3	ty	ERROR
4	df	START
4	gh	PROCESS
4	fgv	ERROR
4	ty	FINISH

It contains operations combined by one value in the GROUP field But not all operations reach the end and do not have an operation with the value FINISH in the list, but have type ERROR, like the rows with GROUP 2 and 3 This table is 1 terabyte I want to delete all chains of operations that did not end with the FINISH status, what is the best way to optimize this?

My code looks like this:

delete from TEST for_delete
    where
        for_delete.group in (
            select group from TEST error
                where
                    error.type='ERROR'
                and
                    error.group NOT IN (select group from TEST where type='FINISH')
        );

But for a plate with such a volume, I think it will be terribly slow, can I somehow improve my code?

CodePudding user response：

Very often EXISTS conditions are faster than IN condition. And NOT EXISTS is almost always faster than NOT IN, so you could try something like this:

delete from test t1
where exists (select *
              from test t2
               where t2."group" = t1."group"
                 and t2."type" = 'ERROR'
                 and not exists (select 
                                 from test t3
                                 where t3."group" = t2."group"
                                   and t3."type" = 'FINISH'));

CodePudding user response：

Typically, in case like this you should use a MV (Materialized View). You can create a table where save all the id that you need to delete and keeping in sync using triggers. For example:

CREARE TABLE IF NOT EXISTS test_MV (
    id VARCHAR(255) PRIMARY KEY
);

You know the system and the data that you are using, you can also decide to keeping the table in sync using event. Using a MV, you can delete all the row using a easier and faster way:

delete from TEST for_delete
where
    for_delete.id in (
        select id from test_MV
    );

Sorry for my bad English