Here's the statement:
LOAD DATA INFILE '/var/lib/mysql-files/test.csv'
INTO TABLE test
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\'
Here's a line that imports fine:
1450327840667357185, 1, "This is my text"
If the text field has a comma though, it fails. For example, see this line:
1450327840667357185, 1, "This is my text, with a comma in it"
Why does it fail?
If I escape the comma (by putting a backslash in front of it) it works fine. This does not make sense though. I have stipulated that the fields may be enclosed by double quotes, so why doesn't it accept all the content therein as a content of the field?
If I have to reprocess all the text to quote comma's, it would be a big task.
CodePudding user response:
Your CSV has spaces after the ,
separator.
I don't have MySQL to play with, but I expect that means it's seeing those spaces as part of the values, causing all sorts of problems.
This means that the CSV line is read as...
- VAL =
1450327840667357185
- SEP =
,
- VAL =
1
- SEP =
,
- VAL =
"This is my text
- SEP =
,
- VAL =
with a comma in it"
Note that the leading spaces are included in the values.
This means that the third values does not start with "
(it starts with a space), so the ENCLOSED BY '"'
isn't doing anything. This means that the parser treats the string as unquoted, which in turn means that the subsequent ,
is seen as a separator.
Try TERMINATED BY ', '
...
This will mean that the the parser now reads the line as...
- VAL =
1450327840667357185
- SEP =
,
- VAL =
1
- SEP =
,
- VAL =
"This is my text, with a comma in it"
Note that the third value now does start with "
(because the leading space is now part of the separator), and so is a quoted string and so the subsequent ,
is not treated as a separator.
CodePudding user response:
Use Input Preprocessing. Load the whole row into a variable then parse it to separate columns.
LOAD DATA INFILE '/var/lib/mysql-files/test.csv'
INTO TABLE test
-- FIELDS TERMINATED BY ''
(@tmp)
SET column1 = SUBSTRING_INDEX(@tmp, ',', 1),
@tmp = TRIM(SUBSTRING(@tmp FROM 1 LOCATE(',', @tmp))),
column2 = SUBSTRING_INDEX(@tmp, ',', 1),
column3 = TRIM(BOTH '"' FROM TRIM(SUBSTRING(@tmp FROM 1 LOCATE(',', @tmp))))