I am using Azure Synapse and I m querying the following CSV file:
1,"A","G"
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"
Here is the T-SQL query:
SELECT *
FROM
OPENROWSET(
BULK 'taxi/raw/Sample.csv',
DATA_SOURCE = 'nyc_taxidata',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
FIRSTROW=2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIELDQUOTE = '"'
)
AS [result]
Here is the output:
C1 C2 C3
1 A G
2 B H
3 C I
4 D J
5 E,F K
I then altered the CSV file to add a colon just before the letter G as so:
1,"A",:"G"
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"
I get:
C1 C2 C3
1 A :"G"
2 B H
3 C I
4 D J
5 E,F K
This lead me to make the following conclusions about FIELDQUOTE:
- The query as a whole scans each row, character by character looking for either a FIELDQUOTE or a FIELDTERMINATOR. If it finds a FIELDQUOTE first then it ignores any FIELDTERMINATORS until it finds another FIELDQUOTE (I'm basing this on the last row).
- The FIELDQUOTE should have a FIELDTERMINATOR immediately before it in order for the FIELDQUOTE to work as expected (I'm basing this on the first row of the second example).
I could be well off the mark here but can anyone give me some feedback on this?.
When I tried running the query above on this CSV:
1,"A"":",G
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"
It returned the following:
C1 C2 C3
1 A": G
2 B H
3 C I
4 D J
5 E,F K
So I tried with this CSV:
Number,Letter
1,"A""":",G
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"
However I got the following error:
Error handling external file: 'Unexpected token ':' at [byte: 22]. Expecting tokens ',', ' ', or '"'. '. File/External table name: 'taxi/raw/Sample.csv'.
I then tried the CSV:
Number,Letter
1,"A":",G
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"
And I got the following:
Error handling external file: 'Unexpected token ':' at [byte: 20]. Expecting tokens ',', ' ', or '"'. '. File/External table name: 'taxi/raw/Sample.csv'.
Lastly, I tried this CSV:
Number,Letter
1,"A":",G
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"
I got the following error:
Error handling external file: 'Unexpected token ':' at [byte: 20]. Expecting tokens ',', ' ', or '"'. '. File/External table name: 'taxi/raw/Sample.csv'.
The strangest one was when I tried running:
Number,Letter
1,"A""",G
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"
I got:
C1 C2 C3
1 A"",G2,B" H
3 C I
4 D J
5 E,F K
CodePudding user response:
"The query as a whole scans each row, character by character looking for either a FIELDQUOTE or a FIELDTERMINATOR. If it finds a FIELDQUOTE first then it ignores any FIELDTERMINATORS until it finds another FIELDQUOTE (I'm basing this on the last row)."
From my comment: I believe your first conclusion is accurate. Specifying the character for string encapsulation or
FIELDQUOTE
in this dialect of csv parser, acts exactly as you describe. It allows row delimiters/terminators, field delimiters/terminators, and even the character used for string encapsulation itself (in most parsers when specifying an escape character) to exist in the string literal for that field.Oddly, I don't think that
OPENROWSET()
allows for specifying an escape character, so I'm not sure how you would get a double quote character into your field/string-literal if it's encapsulated. But that's not part of your question just an observation reading through the documentation. I would be curious whatOPENROWSET()
would do with1,"A"":",G
or1,"A":",G
"The FIELDQUOTE should have a FIELDTERMINATOR immediately before it in order for the FIELDQUOTE to work as expected (I'm basing this on the first row of the second example)."
Yes. I believe this is true as well and is true with any CSV parser I've ever used. The string encapsulation character must follow either row or field terminator/delimiter and must be followed by the row or field terminator/delimiter for it to be recognized as an encapsulated string (or not error). I would be curious what
OPENROWSET()
would do with1,"A":,G
.