I have the pretty bad headache about primary key based on the three rows:
ID DATE_START DATE_END
They describes the version of object. Version of object is the combination of ID, D_S, D_E and it has no any breaks inside of timeline.
There is a few states of object:
State 1 (latest version):
ID DATE_START DATE_END
1 01-01-2022 00:00:00 31-12-9999 00:00:00
State 2 (insert new version)
ID DATE_START DATE_END
1 01-01-2022 00:00:00 01-02-2022 23:59:59
1 02-02-2022 00:00:00 31-12-9999 00:00:00
So you can see each time user creates new version, we should update DATE_END on previous, like this:
DATE_END = DATE_START - 1 Second
I'm almost sure that updating part of PK is the bad practice. I'm on to drop DATE_END from PK at all, in case of there's no any breaks inside timeline.
But my colleagues try to convince me that we should have DATE_END - it's pretty slow to search version on some date cuz we should compare DATE_START of each row against to comparing DATE_START and DATE_END in one row.
Could someone explain me who's wrong and what is the best solution in that case?
CodePudding user response:
I'd say that modifying primary key really is bad practice. Mind you, primary keys often are referenced by foreign keys; even if you tried to update such a primary key (referenced by other foreign key(s)), you won't be able to do it because Oracle will complain that - most probably - parent key doesn't exist. What will you do then? Disable foreign key constraint, update primary key value, update foreign key(s), enable foreign key constraints and hope that nobody/nothing did something that will violate referential integrity during that time (with disabled foreign keys).
So, why wouldn't you rather use a surrogate primary key? In modern Oracle database versions, use an identity column. If your Oracle database doesn't support it, use a sequence. Never modify its value, reference it from other tables and maintain referential integrity. Then, if you want, update those end dates, if you must, as it won't break anything.
CodePudding user response:
You could only store the start date and calculate the end-date; or you could store the end date and have to UPDATE
the last record.
- The first option incurs greater relative cost when you
SELECT
the output and a lower cost onINSERT
; - The second option incurs a greater relative cost when you
INSERT
(andUPDATE
) and a lower cost onSELECT
.
So if all you are doing is INSERT
ing and SELECT
ing then you can look at which option you do more and prioritise the lower-cost option for that.
However, if the primary key is also the target of referential constraints then your costs are exponentially higher for having to modify an existing primary key are you will also have to modify all the columns referenced through the constraints and that will quickly become a nightmare.
What you should do is divorce the primary key used in referential constraints from the date columns that are going to be updated and use a surrogate primary key:
CREATE TABLE table_name (
key NUMBER
GENERATED ALWAYS AS IDENTITY
PRIMARY KEY,
id NUMBER
NOT NULL,
start_date DATE
NOT NULL,
end_date DATE
NOT NULL,
CONSTRAINT table_name__id__sd__ed__unique UNIQUE (id, start_date, end_date)
);
or:
CREATE TABLE table_name (
key NUMBER
GENERATED ALWAYS AS IDENTITY
PRIMARY KEY,
id NUMBER
NOT NULL,
start_date DATE
NOT NULL,
CONSTRAINT table_name__id__sd__ed__unique UNIQUE (id, start_date)
);
Then you do not need to modify any referential constraints and you can evaluate whether you want to have greater costs during INSERT
ion or SELECT
ion.
Whichever of those two you prioritise, that decision is a business decision and there is no "best" choice as what might work in one situation will not work for another situation.