Avoid duplicates when records are inserting simultaneously in before insert hook-CodePudding

I have a before insert hook which increments a number batch_id based on the latest entry before insertion.

However, if records are being inserted simultaneously, even in separate transactions, I end up with duplicate numbers.

Are there any alternatives or solutions to avoid duplicates from happening?

Here is my hook

CREATE OR replace FUNCTION before_orders_insert_trigger
  () returns TRIGGER
AS
  $$
BEGIN
  new.batch_id :=
  (
           SELECT   COALESCE(max(batch_id), 0)   1
           FROM     orders
           WHERE    company_id = new.company_id
           GROUP BY company_id limit 1);
      RETURN new;
      END;
      $$ language 'plpgsql';
CREATE TRIGGER before_orders_insert_trigger before
INSERT
ON orders FOR each row EXECUTE PROCEDURE before_orders_insert_trigger();

CodePudding user response：

In MySQL, you can set your batch_id and other fields you don't want to have duplicates with a unique key then use INSERT IGNORE INTO to insert the data.

CodePudding user response：

The traditional solution in PostgreSQL is to use a SEQUENCE:

CREATE SEQUENCE s;

INSERT INTO orders (batch_id, ...) VALUES (NEXTVAL(s), ...);

Or else use SERIAL as the column type for your primary key, which creates an internal sequence and makes sure to use it during INSERT.

CREATE TABLE orders (
    batch_id SERIAL PRIMARY KEY,
    ...
);

In MySQL, they don't have sequence objects. They implement a similar feature by using the AUTO_INCREMENT option on the column.

CREATE TABLE orders (
  batch_id INT AUTO_INCREMENT PRIMARY KEY,
  ...
);

Using a sequence or auto_increment is preferred over your method because it avoids the race conditions that you identified, which result in duplicate values when inserts run concurrently. It achieves this by using a global lock as it increments the sequence, so no two concurrent requests can be allocated the same value.

Read the documentation pages I linked to for more details.