What is the difference between using 0x0a and \n for ROWTERMINATOR when importing data from an external source?
I tried to query data from a JSON file into rows and columns and I got different outcomes. Here is an image of the JSON file:
Here is the code I used:
SELECT TOP 10*
FROM OPENROWSET(
BULK 'taxi/raw/payment_type_array.json',
DATA_SOURCE='nyc_taxidata',
FORMAT='CSV',
PARSER_VERSION='1.0',
FIELDTERMINATOR='0x0b',
FIELDQUOTE='0x0b',
ROWTERMINATOR='\n'
)
WITH
(jsonDoc NVARCHAR(MAX)
) AS payment_type
CROSS APPLY OPENJSOn(jsonDoc)
WITH(
payment_type SMALLINT,
payment_type_desc NVARCHAR(MAX) AS JSON
);
Here is the outcome:
When I used '0x0a' as the FIELDTERMINATOR I got the following:
CodePudding user response:
From the documentation
Specifying
\n
as a Row Terminator for Bulk ImportWhen you specify
\n
as a row terminator for bulk import, or implicitly use the default row terminator, bcp and the BULK INSERT statement expect a carriage return-line feed combination (CRLF) as the row terminator. If your source file uses a line feed character only (LF) as the row terminator - as is typical in files generated on Unix and Linux computers - use hexadecimal notation to specify the LF row terminator.
So if you are getting different results for \n
and 0x0a
then you have some lines that are just using a line feed (LF) character, and some which are using carriage return (CR) followed by a line feed (LF) character (CRLF). This is a problem with the source data, in my opinion, as using inconsistent new line characters can (and does) cause problems.