Is it bad practice to update part of primary key?-CodePudding

I have the pretty bad headache about primary key based on the three rows:

ID DATE_START DATE_END

They describes the version of object. Version of object is the combination of ID, D_S, D_E and it has no any breaks inside of timeline.

There is a few states of object:

State 1 (latest version):

ID        DATE_START            DATE_END
1         01-01-2022 00:00:00   31-12-9999 00:00:00

State 2 (insert new version)

ID        DATE_START            DATE_END
1         01-01-2022 00:00:00   01-02-2022 23:59:59
1         02-02-2022 00:00:00   31-12-9999 00:00:00

So you can see each time user creates new version, we should update DATE_END on previous, like this:

DATE_END = DATE_START - 1 Second

I'm almost sure that updating part of PK is the bad practice. I'm on to drop DATE_END from PK at all, in case of there's no any breaks inside timeline.

But my colleagues try to convince me that we should have DATE_END - it's pretty slow to search version on some date cuz we should compare DATE_START of each row against to comparing DATE_START and DATE_END in one row.

Could someone explain me who's wrong and what is the best solution in that case?

CodePudding user response：

I'd say that modifying primary key really is bad practice. Mind you, primary keys often are referenced by foreign keys; even if you tried to update such a primary key (referenced by other foreign key(s)), you won't be able to do it because Oracle will complain that - most probably - parent key doesn't exist. What will you do then? Disable foreign key constraint, update primary key value, update foreign key(s), enable foreign key constraints and hope that nobody/nothing did something that will violate referential integrity during that time (with disabled foreign keys).

So, why wouldn't you rather use a surrogate primary key? In modern Oracle database versions, use an identity column. If your Oracle database doesn't support it, use a sequence. Never modify its value, reference it from other tables and maintain referential integrity. Then, if you want, update those end dates, if you must, as it won't break anything.

CodePudding user response：

You could only store the start date and calculate the end-date; or you could store the end date and have to UPDATE the last record.

The first option incurs greater relative cost when you SELECT the output and a lower cost on INSERT;
The second option incurs a greater relative cost when you INSERT (and UPDATE) and a lower cost on SELECT.

So if all you are doing is INSERTing and SELECTing then you can look at which option you do more and prioritise the lower-cost option for that.

However, if the primary key is also the target of referential constraints then your costs are exponentially higher for having to modify an existing primary key are you will also have to modify all the columns referenced through the constraints and that will quickly become a nightmare.

What you should do is divorce the primary key used in referential constraints from the date columns that are going to be updated and use a surrogate primary key:

CREATE TABLE table_name (
  key        NUMBER
             GENERATED ALWAYS AS IDENTITY
             PRIMARY KEY,
  id         NUMBER
             NOT NULL,
  start_date DATE
             NOT NULL,
  end_date   DATE
             NOT NULL,
  CONSTRAINT table_name__id__sd__ed__unique UNIQUE (id, start_date, end_date)
);

or:

CREATE TABLE table_name (
  key        NUMBER
             GENERATED ALWAYS AS IDENTITY
             PRIMARY KEY,
  id         NUMBER
             NOT NULL,
  start_date DATE
             NOT NULL,
  CONSTRAINT table_name__id__sd__ed__unique UNIQUE (id, start_date)
);

Then you do not need to modify any referential constraints and you can evaluate whether you want to have greater costs during INSERTion or SELECTion.

Whichever of those two you prioritise, that decision is a business decision and there is no "best" choice as what might work in one situation will not work for another situation.