Home > OS >  Is my understanding of how FIELDQUOTE works correct?
Is my understanding of how FIELDQUOTE works correct?

Time:08-19

I am using Azure Synapse and I m querying the following CSV file:

1,"A","G"
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"

Here is the T-SQL query:

SELECT *
FROM
    OPENROWSET(
        BULK 'taxi/raw/Sample.csv',
        DATA_SOURCE = 'nyc_taxidata',
        FORMAT = 'CSV',
        PARSER_VERSION = '2.0',
        FIRSTROW=2,
        FIELDTERMINATOR = ',',
        ROWTERMINATOR = '\n',
        FIELDQUOTE = '"'
    )
    AS [result]
 

Here is the output:

C1   C2   C3
1    A    G
2    B    H
3    C    I
4    D    J
5    E,F  K

I then altered the CSV file to add a colon just before the letter G as so:

1,"A",:"G"
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"

I get:

C1   C2   C3
1    A    :"G"
2    B    H
3    C    I
4    D    J
5    E,F  K

This lead me to make the following conclusions about FIELDQUOTE:

  • The query as a whole scans each row, character by character looking for either a FIELDQUOTE or a FIELDTERMINATOR. If it finds a FIELDQUOTE first then it ignores any FIELDTERMINATORS until it finds another FIELDQUOTE (I'm basing this on the last row).
  • The FIELDQUOTE should have a FIELDTERMINATOR immediately before it in order for the FIELDQUOTE to work as expected (I'm basing this on the first row of the second example).

I could be well off the mark here but can anyone give me some feedback on this?.

When I tried running the query above on this CSV:

1,"A"":",G
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"

It returned the following:

 C1   C2  C3
1    A":  G
2    B    H
3    C    I
4    D    J
5    E,F  K

So I tried with this CSV:

Number,Letter
1,"A""":",G
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"

However I got the following error:

Error handling external file: 'Unexpected token ':' at [byte: 22]. Expecting tokens ',', ' ', or '"'. '. File/External table name: 'taxi/raw/Sample.csv'.

I then tried the CSV:

Number,Letter
1,"A":",G
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"

And I got the following:

Error handling external file: 'Unexpected token ':' at [byte: 20]. Expecting tokens ',', ' ', or '"'. '. File/External table name: 'taxi/raw/Sample.csv'.

Lastly, I tried this CSV:

Number,Letter
1,"A":",G
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"

I got the following error:

Error handling external file: 'Unexpected token ':' at [byte: 20]. Expecting tokens ',', ' ', or '"'. '. File/External table name: 'taxi/raw/Sample.csv'.

The strangest one was when I tried running:

Number,Letter
1,"A""",G
2,"B","H"
3,"C","I"
4,"D","J"
5,"E,F","K"

I got:

 C1   C2        C3
1    A"",G2,B"  H
3    C          I
4    D          J
5    E,F        K

CodePudding user response:

  1. "The query as a whole scans each row, character by character looking for either a FIELDQUOTE or a FIELDTERMINATOR. If it finds a FIELDQUOTE first then it ignores any FIELDTERMINATORS until it finds another FIELDQUOTE (I'm basing this on the last row)."

    From my comment: I believe your first conclusion is accurate. Specifying the character for string encapsulation or FIELDQUOTE in this dialect of csv parser, acts exactly as you describe. It allows row delimiters/terminators, field delimiters/terminators, and even the character used for string encapsulation itself (in most parsers when specifying an escape character) to exist in the string literal for that field.

    Oddly, I don't think that OPENROWSET() allows for specifying an escape character, so I'm not sure how you would get a double quote character into your field/string-literal if it's encapsulated. But that's not part of your question just an observation reading through the documentation. I would be curious what OPENROWSET() would do with 1,"A"":",G or 1,"A":",G

  2. "The FIELDQUOTE should have a FIELDTERMINATOR immediately before it in order for the FIELDQUOTE to work as expected (I'm basing this on the first row of the second example)."

    Yes. I believe this is true as well and is true with any CSV parser I've ever used. The string encapsulation character must follow either row or field terminator/delimiter and must be followed by the row or field terminator/delimiter for it to be recognized as an encapsulated string (or not error). I would be curious what OPENROWSET() would do with 1,"A":,G.

  • Related