I am trying to run a simple operation where I have a table called insert_base and another table called insert_values (both with attributes a,b), and I want to just copy all the values in insert_values into insert_base whilst avoiding duplicates - that is, I do not want to insert a tuple already in insert_values, and also do not want to insert the same tuple twice into insert_values. I am currently looking at two query methods to do this:
INSERT INTO insert_base SELECT DISTINCT * FROM insert_values IV
WHERE NOT EXISTS
(SELECT * FROM insert_base IB WHERE IV.a = IB.a AND IV.b = IB.b);
and another that involves using a UNIQUE constraint on insert_base(a,b):
INSERT INTO insert_base SELECT * FROM insert_values ON CONFLICT DO NOTHING;
However, every time I run this the first query runs significantly faster (500ms compared to 1sec 300msec), and I am not sure why that is the case? Do the 2 queries not do roughly the same thing? I am aware that the UNIQUE constraint just puts an index on (a,b), but would that not make it faster than the first method? In fact, when I run the first method with an index on (a,b), it actually runs slightly slower than without any index (or unique constraint), which confuses me even more.
Any help would be much appreciated. I am running all this in postgresql by the way.
CodePudding user response:
Your second query does more work than the first. It attempts to insert every row from your insert_values
table, then when it sometimes hits a conflict it abandons the insert.
Your first query's WHERE clause filters out the rows that can't be inserted before attempting to insert them.