Home > Software design >  Dealing with Foreign Key constraints when loading large data from 2 CSV Files
Dealing with Foreign Key constraints when loading large data from 2 CSV Files

Time:10-25

I'm working on a project with mySQL(8.0.19) The task requires the creation of various tables, with appropriate referential constraints and then filling them with large amounts of data to perform queries and other functions.

I've got 2 large CSV files with around 5000 rows of data. Table B has a Foreign Key pointing to Table A, i.e. when creating Table B:

CREATE TABLE B(
       abc VARCHAR(20),
       sampleName VARCHAR(20),
       ...
       PRIMARY KEY(abc)
       FOREIGN KEY(sampleName) REFERENCES A(Name)
           ON DELETE CASCADE ON UPDATE RESTRICT
       );

Now, the problem I'm encountering is that there are several values for attribute sampleName in the second CSV file, but many of them don't have a corresponding Name in the first CSV file. So I'm encountering a FK constraint error when I attempt to load the data into Table B.

I've been searching endlessly for a method in excel to compare two different columns and remove any non-matching values but so far to no avail. One option I have considered is to turn off FK checks completely, load the data into Table 2 and then write up a DELETE query to remove any non-matching values. However, I'd much rather prefer if I could alter my INSERT query in some way so that a tuple is only added when sampleName from CSV2 has an equal value for Name in Table A.

Edit: I'm using python to go through the CSVS and add data to my tables, the query is as follows:

Q = "INSERT INTO TABLE(a, b, c, d, e) 
       VALUES(%s, %s, %s, %s, %s)"
    
    ... #CSV processing into X, MYSQL connection

   cursor1.execute(Q, X)

Any help would be greatly appreciated.

CodePudding user response:

Use the IGNORE option when you insert to table B. This means don't throw an error if the insert fails, just keep going. If the insert is inserting multiple rows, just skip the rows that fail, and insert those that succeed.

LOAD DATA INFILE has an IGNORE option too.

Demo, tested on MySQL 8.0.31:

mysql> create table A ( name varchar(20) primary key);
mysql> insert into A values ('bill');

mysql> insert into b values ('abc1', 'bill'), ('abc2', 'john');
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`test`.`b`, CONSTRAINT `b_ibfk_1` FOREIGN KEY (`sampleName`) REFERENCES `a` (`name`) ON DELETE CASCADE ON UPDATE RESTRICT)

This is expected, of course. The statement fails, and the table is empty.

mysql> select * from b;
Empty set (0.00 sec)

Now test with ignore:

mysql> insert ignore into b values ('abc1', 'bill'), ('abc2', 'john');
Query OK, 1 row affected, 1 warning (0.00 sec)
Records: 2  Duplicates: 1  Warnings: 1

mysql> select * from b;
 ------ ------------ 
| abc  | sampleName |
 ------ ------------ 
| abc1 | bill       |
 ------ ------------ 

The row that succeeds was inserted, and the row that had the error is skipped.

CodePudding user response:

You can load the data into a temporary table that has no foreign key. Then use a JOIN with table A to insert into the real table B.

INSERT INTO B
SELECT tb.*
FROM tempB AS tb
JOIN TableA AS a on a.name = tb.sampleName
  • Related