Home > Software design >  SQL—copy rows from one table to another on index change
SQL—copy rows from one table to another on index change

Time:05-02

State description

I have two databases, DB1 and DB2, that have the same table, Author, with the fields Author.ID and Author.AuthorName.

The DB1.Author has the AUTO_INCREMENT on its Author.ID field, while the DB2.Author does not have the AUTO_INCREMENT since it relies on the correctness of DB1 data.

Both tables have the PRIMARY index on Author.ID and a UNIQUE index on Author.AuthorName.

DB2.Author has rows copied from the DB1.Author.

Both databases use MariaDB version 10.6.7.


The problem

DB1 manager deleted some entries in the DB1.Author table, and then reordered indexes to have no gaps in index numbers. This means they might have had:

ID AuthorName
1 A
2 B
3 C

Then they deleted the row where the AuthorName was 'B':

ID AuthorName
1 A
3 C

And they finally updated the indexes to have no gaps (3-C changed to 2-C):

ID AuthorName
1 A
2 C

Now I need to find a way to copy such updated state of the rows from the DB1.Author to the DB2.Author without deleting everything from the DB2.Author table, so that I don't lose the data on CASCADE effects.

What is the best approach for this?


My shot

This is what I did, but it obviously cannot work, since in the case of duplicate key, it would attempt to create another duplicate key (duplicate ID 2 would try to INSERT duplicate value of 'C', since it already exists on ID 3):

INSERT INTO DB2.Author (ID, AuthorName)
SELECT DB1.Author.ID, DB1.Author.AuthorName FROM DB1.Author
ON DUPLICATE KEY UPDATE
    ID = DB1.Author.ID,
    AuthorName = DB1.Author.AuthorName;

Additional ways?

Other than the possible SQL query solution, are there any other ways to automatically update the table data in one database when the other database changes its data? Would need to replicate only some tables, while other, linked tables are different.

CodePudding user response:

You need to treat the DB2 table as completely wrong as the update of primary keys on the source table would have completely spoilt it.

  • Delete everything in DB2 table
  • Insert into DB2 table everything from DB1 table

Going forwards, without being condescending, the users with access to DB1 need training (or perhaps you need to reconsider the security against the DB). Updating a primary key value is a wrong thing to do. Gapless sequences is a silly thing to want, especially when you have known dependencies. In fact, gapless sequences is often listed as poor database security (as it makes it easy to just cycle through all data).

You probably want to consider commercial solutions for logical data replication. If they don’t support updates of primary keys then you can use that as a good enough reason not to.

I would invest time in making sure there’s no other logical corruptions of data like this.

CodePudding user response:

tl;dr your problem is your DB manager. The solution is to get him/her to undo the damage they caused by restoring the data to how it was before. Deleting rows is fine. Updating primary keys is never OK.

Do not create a work around or validate it by accommodating his/her mistake, because doing so will make it more likely that it will happen again.


Full answer.

Your actual problem is your "DB manager", who violated a fundamental rule of databases: Never update surrogate key values!

In your case it's even more tragic, because gaps in the ID column values don't matter in any way. If gaps do matter, you're in even worse shape. Allow me to explain...

The author's name is your actual identifier. We know this because there a unique constraint on it.

The ID column is a surrogate key, which are most conveniently implemented as an auto incrementing integer, but surrogate keys would work just as well if they were random (unique) numbers. Gaps, and even the choice of values themselves, are irrelevant to the effectiveness of surrogate keys.

  • Related