I have a postgres database in which I'm refreshing data periodically. Most of the time it works, but sometimes I have issues with a unique index.
Minimal example
create table test_table (
id int
);
create unique index test_table_unique on test_table(id);
(I know, in this case it should be a primary key, but for the sake of example, please bear with me.)
Now, every hour, I do something like this:
begin;
delete from test_table;
insert into test_table (id) values (1), (2), (3)...
commit;
As I said, most of the time it will just work fine. However, sometimes postgres complains about a duplicate entry in the unique index.
error: duplicate key value violates unique constraint test_table_unique
detail: "Key (id)=(2) already exists."
My real database
In my actual table, I'm using JSON payloads, and the unique index is made on fields of that json payload. In particular, the error details is as follows:
create table if not exists source (
id serial primary key,
payload jsonb not null
);
create unique index if not exists source_index_and_id on source ((payload->>'_index'), (payload->>'_id'));
error details: "Key ((payload ->> '_index'::text), (payload ->> '_id'::text))=(companies, AC9860) already exists."
I'm confident there is no actual duplicate data. I'm deleting everything for a particular ->>_index
, and the ->>_id
is unique in my source data.
My understanding is that if I delete rows from a table, the indices will be updated before the next statements are executed. But it doesn't seem to be the case. I've found that it helps (not sure if it actually solves the issue) to commit the changes after the delete, and before the inserts.
begin;
delete...
commit;
begin;
insert...
commit;
What's happening here?
CodePudding user response:
The only options how this could happen are
the deleting transaction rolled back
concurrent transactions inserted new rows after you deteted the original ones
the inserting transaction inserts the same key twice
the inserting transaction is accidentally run before the deleting one
CodePudding user response:
PostGreSQL is not a real relational DBMS and does not match rule 7 of Codd's Rule about functional set operations.
Contrary to other RDBMS PostGreSQL delete rows one by one and this lack of functionality conduct to have sometime fantom key violation.
In my paper that compare PostGreSQL to MS SQL Server I made a test that show this evidence (§ 7 – The hard way to udpates unique values)