I do have the following query and I'm running it against a postgresql db which has more than 10M entries in table account_message
and 1M entries in table message
.
Postgresql is in Version PostgreSQL 11.12, compiled by Visual C build 1914, 64-bit
Is there any way to make this query faster because it takes more than 2 days already and did not finish yet.
DELETE FROM account_message WHERE message_id in
(SELECT t2.id FROM message t2 WHERE NOT EXISTS
(SELECT 1 FROM customer t1 WHERE
t1.username = t2.username));
Table account_message
has the following columns:
id (bigint)(primary key)
user_id (bigint)
message_id (bigint)
isRead (boolean)
isDeleted (boolean)
Table message
has the following columns:
id (bigint)(primary key)
username (character varying)(255)
text (character varying)(10000)
details(character varying)(1000)
status(integer)
Table customer
has the following columns:
username (character varying)(255)(primary key)
type(character varying)(500)
details(character varying)(10000)
status(integer)
active(boolean)
CodePudding user response:
You may be able to improve this by
- getting rid of your dependent subquery, and
- doing it in batches.
Try this to get a batch of one thousand message ids to delete. LEFT JOIN ... WHERE col IS NULL
is a way to write WHERE NOT EXISTS
without a dependent subqiery.
SELECT m.id
FROM message m
LEFT JOIN customer c ON m.username = c.username
WHERE c.username IS NULL
LIMIT 1000
Then, use the subquery in a statement. Repeat it until it deletes no rows.
DELETE
FROM account_message
WHERE message_id IN (
SELECT m.id
FROM message m
LEFT JOIN customer c ON m.username = c.username
WHERE c.username IS NULL
LIMIT 1000)
Doing this in batches of 1000 helps performance: it splits your operation into multiple reasonably sized database transactions.
CodePudding user response:
First, try to optimize the select inside brakets. Something like:
DELETE FROM account_message WHERE message_id in
(
select t2.id message t2
left join customer t1 on (t1.username = t2.username)
where t2.username is NULL
)