I need to import data from an external web service to my mySQL(5.7) database. Problem is, that I need to split the data into to tables. So for example I have the tables
CREATE TABLE a (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100)
);
CREATE TABLE b (
id INT PRIMARY KEY AUTO_INCREMENT,
a_id INT,
name VARCHAR(100)
);
Now I have to insert multiple rows into table b for one row in table a (1:n) As I do not know the id of table a before inserting it, the only way is to insert one row in table a, get the last id and then insert all connected entries to table b.
But, my database is very slow when I insert row by row. It takes more than 1h to insert about 35000 rows in table a and 120000 in table b. If I do a batch insert about 1000 rows on table a (just for testing without filling table b) it is incredible faster (less then 3 minutes)
I guess there must be a solution how I can speed up my import.
Thanks for your help
CodePudding user response:
I presume you are working with a programming language driving your inserts. You need to be able to program this sequence of operations.
First, you need to use this sequence to put a row into a
and dependent rows into b
. It uses LAST_INSERT_ID() to handle a_id
. That's faster and much more robust than querying the table to find the correct id value.
INSERT INTO a (name) VALUES ('Claus');
SET @a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (@a_id, 'von');
INSERT INTO b (a_id, name) VALUES (@a_id, 'Bönnhoff');
The trick is to capture the a.id
value in the session variable @a_id
, and then reuse it for each dependent INSERT. (I have turned you into an aristocrat to illustrate this, sorry :-)
Second, you should keep this in mind: INSERTs are cheap, but transaction COMMITs are expensive. That's because MySQL (InnoDB actually) does not actually update tables until COMMIT. Unless you manage your transactions explicitly, the DBMS uses a feature called "autocommit" in which it immediately commits each INSERT (or UPDATE or DELETE).
Fewer transactions gets you better speed. Therefore, to improve bulk-loading performance you want to bundle together 100 or so INSERTs into a single transaction. (The exact number doesn't matter very much.) You can do something like this:
START TRANSACTION; /* start an insertion bundle */
INSERT INTO a (name) VALUES ('Claus');
SET @a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (@a_id, 'von');
INSERT INTO b (a_id, name) VALUES (@a_id, 'Bönnhoff');
INSERT INTO a (name) VALUES ('Oliver');
SET @a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (@a_id, 'Jones');
... more INSERT operations ...
INSERT INTO a (name) VALUES ('Jeff');
SET @a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (@a_id, 'Atwood');
COMMIT; /* commit the bundle */
START TRANSACTION; /* start the next bundle */
INSERT INTO a (name) VALUES ('Joel');
SET @a_id = LAST_INSERT_ID();
INSERT INTO b (a_id, name) VALUES (@a_id, 'Spolsky');
... more INSERT operations ...
COMMIT; /* finish the bundle */
(All this, except LAST_INSERT_ID(), works on any SQL-based RDBMS. Each make of RDBMS has its own way of handling IDs.(