I have a before insert hook which increments a number batch_id
based on the latest entry before insertion.
However, if records are being inserted simultaneously, even in separate transactions, I end up with duplicate numbers.
Are there any alternatives or solutions to avoid duplicates from happening?
Here is my hook
CREATE OR replace FUNCTION before_orders_insert_trigger
() returns TRIGGER
AS
$$
BEGIN
new.batch_id :=
(
SELECT COALESCE(max(batch_id), 0) 1
FROM orders
WHERE company_id = new.company_id
GROUP BY company_id limit 1);
RETURN new;
END;
$$ language 'plpgsql';
CREATE TRIGGER before_orders_insert_trigger before
INSERT
ON orders FOR each row EXECUTE PROCEDURE before_orders_insert_trigger();
CodePudding user response:
In MySQL, you can set your batch_id and other fields you don't want to have duplicates with a unique key then use INSERT IGNORE INTO to insert the data.
CodePudding user response:
The traditional solution in PostgreSQL is to use a SEQUENCE:
CREATE SEQUENCE s;
INSERT INTO orders (batch_id, ...) VALUES (NEXTVAL(s), ...);
Or else use SERIAL as the column type for your primary key, which creates an internal sequence and makes sure to use it during INSERT.
CREATE TABLE orders (
batch_id SERIAL PRIMARY KEY,
...
);
In MySQL, they don't have sequence objects. They implement a similar feature by using the AUTO_INCREMENT option on the column.
CREATE TABLE orders (
batch_id INT AUTO_INCREMENT PRIMARY KEY,
...
);
Using a sequence or auto_increment is preferred over your method because it avoids the race conditions that you identified, which result in duplicate values when inserts run concurrently. It achieves this by using a global lock as it increments the sequence, so no two concurrent requests can be allocated the same value.
Read the documentation pages I linked to for more details.