I'm trying to import data from semicolon separated csv file into a SQL Server database. Here is the table structure
CREATE TABLE [dbo].[waste_facility]
(
[Id] INT IDENTITY (1, 1) NOT NULL,
[postcode] VARCHAR (50) NULL,
[name] VARCHAR (50) NULL,
[type] VARCHAR (255) NULL,
[street] VARCHAR (255) NULL,
[suburb] VARCHAR (255) NULL,
[municipality] VARCHAR (255) NULL,
[telephone] VARCHAR (255) NULL,
[website] VARCHAR (255) NULL,
[longtitude] DECIMAL (18, 8) NULL,
[latitude] DECIMAL (18, 8) NULL,
PRIMARY KEY CLUSTERED ([Id] ASC)
);
The csv file is shown below:
Location Coordinate;Feature Extent;Projection;Postcode;Name Of Facility;Type Of Facility;Street;Suburb;Municipality;Telephone Number;Website;Easting Coordinate;Northing Coordinate;Longitude Coordinate;Latitude Coordinate;Google Maps Direction
-37.9421182892,145.3193857967;"{""coordinates"": [145.3193857967, -37.9421182892], ""type"": ""Point""}";MGA zone 55;3156;Cleanaway Lysterfield Resource Recovery Centre;Recovery Centre;840 Wellington Road;LYSTERFIELD;Yarra Ranges;9753 5411;https://www.cleanaway.com.au/location/lysterfield/;352325;5799275;145.31938579674124;-37.94211828921733;https://www.google.com.au/maps/dir//-37.94211828921733,145.31938579674124/@your location,17z/data=!4m2!4m1!3e0
-38.0529529215,145.2433557709;"{""coordinates"": [145.2433557709, -38.0529529215], ""type"": ""Point""}";MGA zone 55;3175;Smart Recycling (South Eastern Depot);Recycling Centre;185 Dandenong-Hastings Rd;LYNDHURST;Greater Dandenong;8787 3300;https://smartrecycling.com.au/;345876;5786853;145.24335577090602;-38.05295292152536;https://www.google.com.au/maps/dir//-38.05295292152536,145.24335577090602/@your location,17z/data=!4m2!4m1!3e0
-38.0533129717,145.267610135;"{""coordinates"": [145.267610135, -38.0533129717], ""type"": ""Point""}";MGA zone 55;3976;Hampton Park Transfer Station (Outlook Environmental);Transfer Station;274 Hallam Road;HAMPTON PARK;Casey;9554 4502;https://www.suez.com.au/en-au/who-we-are/suez-in-australia-and-new-zealand/our-locations/waste-management-hampton-park-transfer-station;348005;5786853;145.2676101350274;-38.053312971691255;https://www.google.com.au/maps/dir//-38.053312971691255,145.2676101350274/@your location,17z/data=!4m2!4m1!3e0
-38.1243050577,145.2183465487;"{""coordinates"": [145.2183465487, -38.1243050577], ""type"": ""Point""}";MGA zone 55;3977;Frankston Regional Recycling and Recovery Centre;Recycling Centre;20 Harold Road;SKYE;Frankston;1300 322 322;https://www.frankston.vic.gov.au/Environment-and-Waste/Waste-and-Recycling/Frankston-Regional-Recycling-and-Recovery-Centre-FRRRC/Accepted-Items-at-FRRRC;343833;5778893;145.21834654873447;-38.12430505770815;https://www.google.com.au/maps/dir//-38.12430505770815,145.21834654873447/@your location,17z/data=!4m2!4m1!3e0
-38.0973208774,145.4920399066;"{""coordinates"": [145.4920399066, -38.0973208774], ""type"": ""Point""}";MGA zone 55;3810;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09732087738631,145.4920399066473/@your location,17z/data=!4m2!4m1!3e0
There are some columns that I don't need, so I create a format file to import the data. The format file is shown as below
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="50"/>
<FIELD ID="12" xsi:type="CharFixed" LENGTH="50"/>
<FIELD ID="13" xsi:type="CharFixed" LENGTH="50"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharFixed" LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="6" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="7" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="8" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="9" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="14" xsi:type="CharFixed" LENGTH="50"/>
<FIELD ID="15" xsi:type="CharFixed" LENGTH="50"/>
<FIELD ID="10" xsi:type="CharFixed" LENGTH="41"/>
<FIELD ID="11" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="41"/>
<FIELD ID="16" xsi:type="CharFixed" LENGTH="50"/>
</RECORD>
<ROW>
<COLUMN SOURCE="2" NAME="postcode" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="name" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="type" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="5" NAME="street" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="6" NAME="suburb" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="7" NAME="municipality" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="8" NAME="telephone" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="9" NAME="website" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="10" NAME="longtitude" xsi:type="SQLDECIMAL" PRECISION="18" SCALE="8"/>
<COLUMN SOURCE="11" NAME="latitude" xsi:type="SQLDECIMAL" PRECISION="18" SCALE="8"/>
</ROW>
</BCPFORMAT>
Then I tried both bulk insert and bcp in
- neither of them works.
Here is the bulk insert command
USE [waste-facility-locations];
BULK INSERT [dbo].[waste_facility]
FROM 'E:\onboardingIteration\waste-facility-locations.csv'
WITH (FORMATFILE = 'E:\onboardingIteration\waste_facility_formatter.xml',
FIRSTROW = 2,
LASTROW = 6,
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
ERRORFILE = 'E:\onboardingIteration\myRubbishData.log');
But unlucky some error file were generated. Here is what myRubbishData.log
error says:
Row 2 File Offset 1993 ErrorFile Offset 0 - HRESULT 0x80004005
And the actual row stored in myRubbishData.txt
:
;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09732087738631,145.4920399066473/@your location,17z/data=!4m2!4m1!3e0;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09
As you can see, it seems like rows are not correctly separated. So I tried to change the row delimiter to "\n","\r","\n\r","\r\n", none of them work.
And I tried bcp
. It did not work either.
Here is the bcp
command I used:
bcp [waste-facility-locations].[dbo].[waste_facility] in "E:\onboardingIteration\waste-facility-locations.csv" -f "E:\onboardingIteration\waste_facility_formatter.xml" -T -S "(LocalDB)\MSSQLLocalDB" -F 2 -t ";" -r "\n"
Then I get an error said somehow the same thing
SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file0 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 1
One interesting things is, if I create a new excel and choose "Get data" option to import the csv file, the file can be literally correctly parsed.
Basically I can't find what I did wrong here. Can someone help me on this one?
CodePudding user response:
The SQL Server import facilities are very intolerant of bad data and even just formatting variations or options. In my career, I have literally spent thousands of work-hours trying to develop and debug import procedures for customers. I can tell you right now, that trying to fix this with SQL alone is both difficult and time-consuming.
When you have this problem (bad data and/or inconsistent formatting) it is almost always easier to find or develop a more flexible tool to pre-process the data into the rigid standard that SQL expects. So I would say that if Excel can parse it then just use Excel automation to pre-process them and then use SQL to import the Excel output. If that's not practical for you, then I'd advise writing your own tool in some client language (C#, Vb, Java, Python, etc.) to pre-process the files.
You can do it in SQL (and I have done it many times), but I promise you that it is a long complicated trek.
SSIS has more flexible error-handling for problems like this, but if you are not already familiar and using it, it has a very steep learning curve and your first SSIS project is likely to be very time-consuming also.