Home > other >  Bulk insert csv file with semicolon as delimiter
Bulk insert csv file with semicolon as delimiter

Time:03-13

I'm trying to import data from semicolon separated csv file into a SQL Server database. Here is the table structure

CREATE TABLE [dbo].[waste_facility] 
(
    [Id]           INT             IDENTITY (1, 1) NOT NULL,
    [postcode]     VARCHAR (50)    NULL,
    [name]         VARCHAR (50)    NULL,
    [type]         VARCHAR (255)   NULL,
    [street]       VARCHAR (255)   NULL,
    [suburb]       VARCHAR (255)   NULL,
    [municipality] VARCHAR (255)   NULL,
    [telephone]    VARCHAR (255)   NULL,
    [website]      VARCHAR (255)   NULL,
    [longtitude]   DECIMAL (18, 8) NULL,
    [latitude]     DECIMAL (18, 8) NULL,
    PRIMARY KEY CLUSTERED ([Id] ASC)
);

The csv file is shown below:

Location Coordinate;Feature Extent;Projection;Postcode;Name Of Facility;Type Of Facility;Street;Suburb;Municipality;Telephone Number;Website;Easting Coordinate;Northing Coordinate;Longitude Coordinate;Latitude Coordinate;Google Maps Direction
-37.9421182892,145.3193857967;"{""coordinates"": [145.3193857967, -37.9421182892], ""type"": ""Point""}";MGA zone 55;3156;Cleanaway Lysterfield Resource Recovery Centre;Recovery Centre;840 Wellington Road;LYSTERFIELD;Yarra Ranges;9753 5411;https://www.cleanaway.com.au/location/lysterfield/;352325;5799275;145.31938579674124;-37.94211828921733;https://www.google.com.au/maps/dir//-37.94211828921733,145.31938579674124/@your location,17z/data=!4m2!4m1!3e0
-38.0529529215,145.2433557709;"{""coordinates"": [145.2433557709, -38.0529529215], ""type"": ""Point""}";MGA zone 55;3175;Smart Recycling (South Eastern Depot);Recycling Centre;185 Dandenong-Hastings Rd;LYNDHURST;Greater Dandenong;8787 3300;https://smartrecycling.com.au/;345876;5786853;145.24335577090602;-38.05295292152536;https://www.google.com.au/maps/dir//-38.05295292152536,145.24335577090602/@your location,17z/data=!4m2!4m1!3e0
-38.0533129717,145.267610135;"{""coordinates"": [145.267610135, -38.0533129717], ""type"": ""Point""}";MGA zone 55;3976;Hampton Park Transfer Station (Outlook Environmental);Transfer Station;274 Hallam Road;HAMPTON PARK;Casey;9554 4502;https://www.suez.com.au/en-au/who-we-are/suez-in-australia-and-new-zealand/our-locations/waste-management-hampton-park-transfer-station;348005;5786853;145.2676101350274;-38.053312971691255;https://www.google.com.au/maps/dir//-38.053312971691255,145.2676101350274/@your location,17z/data=!4m2!4m1!3e0
-38.1243050577,145.2183465487;"{""coordinates"": [145.2183465487, -38.1243050577], ""type"": ""Point""}";MGA zone 55;3977;Frankston Regional Recycling and Recovery Centre;Recycling Centre;20 Harold Road;SKYE;Frankston;1300 322 322;https://www.frankston.vic.gov.au/Environment-and-Waste/Waste-and-Recycling/Frankston-Regional-Recycling-and-Recovery-Centre-FRRRC/Accepted-Items-at-FRRRC;343833;5778893;145.21834654873447;-38.12430505770815;https://www.google.com.au/maps/dir//-38.12430505770815,145.21834654873447/@your location,17z/data=!4m2!4m1!3e0
-38.0973208774,145.4920399066;"{""coordinates"": [145.4920399066, -38.0973208774], ""type"": ""Point""}";MGA zone 55;3810;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09732087738631,145.4920399066473/@your location,17z/data=!4m2!4m1!3e0

There are some columns that I don't need, so I create a format file to import the data. The format file is shown as below

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharFixed" LENGTH="50"/>
  <FIELD ID="12" xsi:type="CharFixed" LENGTH="50"/>
  <FIELD ID="13" xsi:type="CharFixed" LENGTH="50"/>
  <FIELD ID="2" xsi:type="CharFixed" LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="3" xsi:type="CharFixed" LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="4" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="5" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="6" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="7" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="8" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="9" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="14" xsi:type="CharFixed" LENGTH="50"/>
  <FIELD ID="15" xsi:type="CharFixed" LENGTH="50"/>
  <FIELD ID="10" xsi:type="CharFixed" LENGTH="41"/>
  <FIELD ID="11" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="41"/>
  <FIELD ID="16" xsi:type="CharFixed" LENGTH="50"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="2" NAME="postcode" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="3" NAME="name" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="4" NAME="type" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="5" NAME="street" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="6" NAME="suburb" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="7" NAME="municipality" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="8" NAME="telephone" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="9" NAME="website" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="10" NAME="longtitude" xsi:type="SQLDECIMAL" PRECISION="18" SCALE="8"/>
  <COLUMN SOURCE="11" NAME="latitude" xsi:type="SQLDECIMAL" PRECISION="18" SCALE="8"/>
 </ROW>
</BCPFORMAT>

Then I tried both bulk insert and bcp in - neither of them works.

Here is the bulk insert command

USE [waste-facility-locations];  

BULK INSERT [dbo].[waste_facility]   
FROM 'E:\onboardingIteration\waste-facility-locations.csv'   
WITH (FORMATFILE = 'E:\onboardingIteration\waste_facility_formatter.xml',
      FIRSTROW = 2,
      LASTROW = 6,
      FIELDTERMINATOR = ';',
      ROWTERMINATOR = '\n',
      ERRORFILE = 'E:\onboardingIteration\myRubbishData.log');  

But unlucky some error file were generated. Here is what myRubbishData.log error says:

Row 2 File Offset 1993 ErrorFile Offset 0 - HRESULT 0x80004005

And the actual row stored in myRubbishData.txt:

;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09732087738631,145.4920399066473/@your location,17z/data=!4m2!4m1!3e0;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09

As you can see, it seems like rows are not correctly separated. So I tried to change the row delimiter to "\n","\r","\n\r","\r\n", none of them work.

And I tried bcp. It did not work either.

Here is the bcp command I used:

bcp [waste-facility-locations].[dbo].[waste_facility] in "E:\onboardingIteration\waste-facility-locations.csv" -f "E:\onboardingIteration\waste_facility_formatter.xml" -T -S "(LocalDB)\MSSQLLocalDB" -F 2 -t ";" -r "\n"

Then I get an error said somehow the same thing

SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file

0 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 1

One interesting things is, if I create a new excel and choose "Get data" option to import the csv file, the file can be literally correctly parsed.

Basically I can't find what I did wrong here. Can someone help me on this one?

CodePudding user response:

The SQL Server import facilities are very intolerant of bad data and even just formatting variations or options. In my career, I have literally spent thousands of work-hours trying to develop and debug import procedures for customers. I can tell you right now, that trying to fix this with SQL alone is both difficult and time-consuming.

When you have this problem (bad data and/or inconsistent formatting) it is almost always easier to find or develop a more flexible tool to pre-process the data into the rigid standard that SQL expects. So I would say that if Excel can parse it then just use Excel automation to pre-process them and then use SQL to import the Excel output. If that's not practical for you, then I'd advise writing your own tool in some client language (C#, Vb, Java, Python, etc.) to pre-process the files.

You can do it in SQL (and I have done it many times), but I promise you that it is a long complicated trek.

SSIS has more flexible error-handling for problems like this, but if you are not already familiar and using it, it has a very steep learning curve and your first SSIS project is likely to be very time-consuming also.

  • Related