Home > Back-end >  INSERT with gap-less serial numbers per partition
INSERT with gap-less serial numbers per partition

Time:05-16

I have a table with a trigger that sets an order number per customer, I constantly suffer a unique key in that table:

SQLSTATE[23505]: Unique violation: 7
ERROR: duplicate key value violates unique constraint "order_customer_unique"

The trigger is defined as BEFORE INSERT ON orders FOR EACH ROW.

Is there any technique to solve this concurrency problem in the tables?
Ideally, the number per customer is a sequence and does not lose numbers.

CREATE OR REPLACE FUNCTION "public"."trigger_set_order_customer_number"()
  RETURNS "pg_catalog"."trigger" AS
$BODY$
BEGIN
   NEW.sequence = (
      SELECT COALESCE( MAX ( order.sequence), 0 )   1
      FROM orders
      WHERE orders.customer_type = NEW.customer_type
      AND orders.customer_id = NEW.customer_id
      AND orders.model_id = NEW.model_id LIMIT 1);
   RETURN NEW;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100

CodePudding user response:

Before you proceed, consider not doing this. Gap-less sequential numbers per partition under concurrent write load have many implications and can become a costly endeavor. Much depends on details of your setup and requirements. See:

The trigger you have is inherently unsafe against concurrent writes. Conflicts are to be expected, as each transaction sees its own snapshot of the table, and the calculated "next number" may end up being the same for multiple concurrent inserts.

You could escalate to SERIALIZABLE transaction isolation. But that's expensive and you have to prepare for serialization failures, just like you have to prepare for unique violations now. Catch 22.

A cheaper approach is to add a parent table with exactly one row per partition (per "unique customer account"). Then you can lock a row there so that each unique customer account can only be manipulated by a single transaction at a time. That's typically a viable path and performs better. But it all depends ... Related:

While being at it, we can put an order_count into that parent table to make everything simpler, and cheaper. Could look like this:

CREATE TABLE unique_customer_account(
  customer_id   int  -- replace with actual data types
, customer_type int
, model_id      int
, order_count   int NOT NULL DEFAULT 0
, PRIMARY KEY (customer_id, customer_type, model_id)
);

CREATE OR REPLACE FUNCTION public.trigger_set_order_customer_number()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   -- this locks the row and makes concurrent transactions targeting
   -- the same unique_customer_account wait till this one is done
   INSERT INTO unique_customer_account AS u
          (    customer_type,     customer_id,     model_id, order_count)
   VALUES (NEW.customer_type, NEW.customer_id, NEW.model_id, 1)
   ON     CONFLICT (customer_type, customer_id, model_id) DO UPDATE  -- ①
   SET    order_count = u.order_count   1    -- ②
   RETURNING u.order_count
   INTO   NEW.sequence;  -- ③

   IF NOT FOUND THEN
      RAISE EXCEPTION 'Row in "unique_customer_account" not found!'; -- ④
   END IF;

   RETURN NEW;
END
$func$;

CREATE TRIGGER orders_insbef
BEFORE INSERT ON orders
FOR EACH ROW EXECUTE FUNCTION public.trigger_set_order_customer_number();

db<>fiddle here

A quick local test with concurrent inserts also works as advertised.

UPDATE implicitly takes out an exclusive lock on the row, same as SELECT FOR UPDATE.

② No COALESCE since order_count is defined NOT NULL.

③ We can conveniently assign the incremented counter into NEW.sequence directly.

④ The row in unique_customer_account must exist. I built an UPSERT into the trigger. Or you might handle it separately. Either way, there is another (less common) race condition if multiple concurrent transactions want to enter a new row in unique_customer_account at the same time, which can result in no row from the UPSERT. Can be solved. See:

A simple and clean solution to that next problem would be to INSERT ... ON CONFLICT DO NOTHING new rows in unique_customer_account in a separate command before inserting any orders. Can be in the same transaction, but not a CTE of the same command.

Then you can assume the row in unique_customer_account exists and reduce to a plain UPDATE in the trigger. All race conditions gone. (Except exotic ones.)

Like I said, many intricate details ...

CodePudding user response:

YES! Do not use a trigger. Define your sequence column as integer generated always as identity. That eliminates the need for the trigger complexly as Postgres handles assigning the value; you do not - indeed you can not. Just FYI, assigning a value as MAX 1, is a virtually guaranteed to create duplicates. Unfortunately this will lose numbers as there will be gaps at some point. Generating gap less sequences is quite difficult, and for virtually no gain. Unless there is a legal it is best you just get use to it having missing values and move on.

  • Related