I have a file containing 40 gb of rows data, but it is in raw text format (no commands in sql syntax), it's always from one to unlimited rows in one line separated by ; and rows are array in () brackets with values separated with ,
examples of lines:
(1,'text',NULL,NULL);(2,'string',NULL,1);
(12,'date',123,NULL);(2,'foo',11,15);
Is there a way to import this data using mysqlimport or LOAD DATA statement without parsing data with programming languages? If not, what are the ways to parse it fast, preferrably in a few minutes, not days, because when I edit this dump manually with EmEditor (which supposed to be streaming editor but it lags anyway) it takes half a hour to save even a small change...
I have tried ENCLOSED BY option but it does not have brackets option
LOAD DATA
INFILE "path"
INTO TABLE test
CHARACTER SET utf8
FIELDS
TERMINATED BY ','
ENCLOSED BY ')'
LINES
TERMINATED BY ';'
CodePudding user response:
MySQL has no data import tool that understands this format. LOAD DATA can't parse it.
I agree trying to edit a 40GB file is beyond the capacity of almost any editor (emacs users can keep their superiority to themselves at this point).
You could write some code to parse it, but I think an easier solution would be to use sed
which is a true streaming editor (the name sed
literally means "stream editor").
sed -e 's/;/;\n/g' myfile.txt |
sed -e '/;/s/^/INSERT INTO `test` VALUES /g' |
mysql ...options...
Of course, try this on a small sample file first, to make sure it works the way you expect it to.
CodePudding user response:
You could use a tool like gawk.
gawk -F; "{ for(i=1;i<=NF;i ) if($i!=\"\") print \"INSERT INTO table VALUES \"$i\";\" }NR00==1{ print \"COMMIT; START TRANSACTION;\"}" export.txt >import.sql
This will create an output file like:
INSERT INTO table VALUES (1,'text',NULL,NULL);
INSERT INTO table VALUES (2,'string',NULL,1);
COMMIT; START TRANSACTION;
INSERT INTO table VALUES (12,'date',123,NULL);
INSERT INTO table VALUES (2,'foo',11,15);
With the COMMIT; START TRANSACTION
every 1000 lines (increase that number when needed)
gawk can be downloaded (i.e.) here: http://gnuwin32.sourceforge.net/packages/gawk.htm
- Only downside on Windows is that you need to escape those
"
, using a\
. - I cannot guess if this is quick enough, increasing the 1000 might influence import time.