SELECT user_id FROM contacts WHERE phone = " 18888576309";
user_id | phone | created_at
-------------- --------------- ----------------------------
db0db85f2331 | 18888576309 | 2022-03-22 23:28:04.308612
db0db85f2331 | 18888576309 | 2022-03-22 23:28:04.310283
db0db85f2331 | 18888576309 | 2022-03-22 23:28:04.311891
This doesn't look very weird, until you realize that the model is defined like this:
class Contact(db.Model):
__tablename__ = 'contacts'
user_id = db.Column(VARCHAR(16), primary_key=True)
phone = db.Column(VARCHAR(64), primary_key=True)
created_at = db.Column(DateTime)
def __init__(self, user_id, phone):
self.user_id = user_id
self.contact_phone = phone
self.created_at = datetime.datetime.utcnow()
Note: There are two primary keys defined here (making, afaik, a composite key). They are user_id
and phone
.
I've been trying for longer than I care to admit to find the source of how this could happen, but the internet seems devoid of any information regarding this as a possibility (there are in fact several posts that say this is impossible!). The only unusual thing, and I believe this is the root of the problem (though that's purely an intuition) is that the method which saves these objects uses Session.bulk_save_objects(). Thus far I have:
- Checked all relevant migrations; anything altering the
contacts
table was well before 2022-03-22 - Checked the file-history of the files which both define the
Contact
and the file which handles the creation of contacts---there hasn't been a change since well before 2022-03-22 - Attempted to reproduce this issue locally a plethora of ways (including using
multiprocessing
to try to callSession.bulk_save_objects()
on the same list of objects simultaneously) - Read all I could in SQLAlchemy's docs regarding
Session.bulk_save_objects()
How might this type of thing happen at all? I was always under the impression that rows w/ identical primary keys just couldn't exist by virtue of some fundamental database logic, and maybe the only way it could happen is if two different processes added identical rows to the DB at the same time.
CodePudding user response:
There are two primary keys defined here (making, afaik, a composite key).
If there's indeed a PK on (user_id, phone)
, then your output doesn't make any sense. UNIQUE
indexes (backing PRIMARY KEY
and UNIQUE
constraints) are enforced at all times.
First make sure you are not barking up the wrong tree. Are there multiple tables named "contacts" or "Contacts"? In multiple schemas?
Check with:
SELECT oid, relnamespace::regnamespace AS schema, relname, relkind
FROM pg_class
WHERE relname ILIKE 'contacts';
See:
Then check the actual table definition in psql with \d contacts
- or whatever the actual, correct, fully-qualified tablename is.
If that doesn't clear the mist (like it probably will), you must be looking at a broken index. Fix with REINDEX
asap:
REINDEX TABLE contacts;
Or, at the very least, just the one index:
REINDEX INDEX contacts_pkey; -- use actual name
Or, more radically, with VACUUM FULL
(rewriting table and all indexes from scratch):
VACUUM FULL contacts;
Both block concurrent access.
I have never seen a broken index myself, but it happens - especially on older versions and/or faulty hardware. It would be advisable to pin down the cause ...