Home > Software engineering >  How do I write a custom Postgresql constraint on a table that requires checking data from two column
How do I write a custom Postgresql constraint on a table that requires checking data from two column

Time:10-25

I have a table where I am mapping ids to one another. The table looks more or less like this:

id1 id2
123 234
345 456
567 678
001 001

The constraint I am trying to add is that if an id already exists in column id2 then it is not allowed to be entered into column id1. There's one small exception to that though in that if id1 = id2 then it's allowed (see the last row). Here is the constraint I have written for it (the table above is referenced as demo_table in the function):

CREATE OR REPLACE FUNCTION public.chain_mapping_constraint(pid1 BIGINT, pid2 BIGINT)
RETURNS bool AS
$$

SELECT CASE WHEN pid1 = pid2 THEN TRUE
            WHEN (SELECT COUNT(t.*) FROM demo_table t WHERE t.id2 = pid1) > 0 THEN FALSE
            ELSE TRUE
       END;
$$  LANGUAGE sql STABLE

Then after creating the table I add the constraint via:

ALTER TABLE table ADD CONSTRAINT chain_mapping_constraint
CHECK (public.chain_mapping_constraint(id1, id2)) NOT VALID;

This works if you insert the records one at a time, but I've found that inserting multi-line queries all at once causes it to fail. I believe it is failing because even though we might violate the condition in the query itself, that condition is not in the table yet and so it passes. Here's an example of the code I'm using to insert that causes the constraint to fail:

INSERT INTO demo_table(id1, id2)
SELECT id1, id2
FROM some_other_table

So when a multi-line query alone violates the constraint, the function looks for this issue in the table rather than looking at the query itself, doesn't see it in the table, and then inserts the problematic records.

Questions:

  1. How do I make my constraint consider the queries I insert rather than the table?
  2. Is there a better way to do this?

CodePudding user response:

First, addressing the elephant in the room.

  1. With what you have described, INSERT INTO demo_table values(1,2),(2,3) would fail (I will explain later how to tweak the function to achieve this) whereas INSERT INTO demo_table values(2,3),(1,2) would succeed.
    In SQL, these 2 queries are identical: insert the 2 same records in 1 atomic operation and once saved in the table, SQL hides away the order in which they are physically stored (which does not always match their insertion chronological order anyway); there is no good reason for one to work and the other to fail.
  2. You also need to decide what to do about records with ID1 = ID2.
    I get you do not want any new record to block itself from being inserted; the consequence of your current code seems to make it possible to insert any number of e.g. (1,1). I am not sure this is what you intended to do.

To make your constraint valid, you need to bother reject records when ID1 equals a preexisting ID2 but also when ID2 matches a preexisting ID1.

You need exclusion constraints.

Solution1: all in 1 table

If ID1 <= ID2, the easiest approach is to do:

CREATE EXTENSION IF NOT EXISTS BTREE_GIST;
CREATE TABLE demo_table(
    ID1 BIGINT NOT NULL,
    ID2 BIGINT NOT NULL,
    CHECK (ID1 <= ID2),
    EXCLUDE USING gist (int8range(ID1, ID2) WITH -|-) WHERE (id1<>id2)
);

We cheat a little by turning the 2 fields into 1 value (type range) and by using the -|- operator, that ensures no 2 ranges share opposite bounds (meaning the lower bound of 1 range must be different from the upper bound of another range).

Solution 2: using a view

If there is no constraints linking ID1 and ID2 together, you can trick the system with a view.

First, we need a table that will contain 1 id field and 2 helper fields.

CREATE EXTENSION IF NOT EXISTS BTREE_GIST;
CREATE TABLE demo_table_flat(
JoinValue BIGSERIAL /* Creates the sequence demo_table_flat_joinvalue_seq*/ ,
IDField SMALLINT,
ID BIGINT NOT NULL,
PRIMARY KEY (JoinValue, IDField),
CHECK (IDFIELD IN (0,1)),
EXCLUDE USING GIST (JoinValue WITH <>, IDField WITH <>, ID WITH =)
);

JoinValue and IDField are going to be used to create the view.

CREATE VIEW demo_view AS
SELECT t1.id AS ID1, t2.id AS ID2
FROM (SELECT JoinValue, ID FROM demo_table_flat WHERE IDField = 0) t1
JOIN (SELECT JoinValue, ID FROM demo_table_flat WHERE IDField = 1) t2
    ON t1.JoinValue = t2.JoinValue;

With the view definition in mind, let us analyze the exclusion constraint.
Fields coming from 2 different records in demo_view (JoinValue WITH <>) comparing ID1 from a record with ID2 from another (IDField WITH <>) trigger the exclusion constraint if they are equal (ID with =).

Now, the issue is, obviously, that we cannot simply insert data in the view. We solve this with a rule (see also here).

CREATE RULE "demo_view_insert" AS ON INSERT TO demo_view
DO INSTEAD
INSERT INTO demo_table_flat
SELECT UNNEST(ARRAY[nextval('demo_table_flat_joinvalue_seq'),currval('demo_table_flat_joinvalue_seq')]), UNNEST(ARRAY[0,1]), UNNEST(ARRAY[NEW.ID1, NEW.ID2])

Ideally, you also want to create an UPDATE and a DELETE rule.


I am writing what follows only for the sake of giving a complete explanation. You can implement it to determine the exact behavior you want to get (see my initial 2 points) but not as production code.

  1. Change STABLE to VOLATILE
    This forces the function to be called for every record you try to insert.

  2. Use the alternative definition:

    CREATE OR REPLACE FUNCTION public.chain_mapping_constraint(pid1 BIGINT, pid2 BIGINT)
    RETURNS bool AS
    $$
    SELECT NOT EXISTS(SELECT 1 FROM demo_table t WHERE t.id2 = pid1)
           END;
    $$  LANGUAGE sql VOLATILE
    
  • Related