Issue: I have a single .xml file containing 13.5k of data sets and need to import it into SSMS. Unfortunatly, it contain a style I never met before; first a column declaration part, then the data part without any specific column names. With thus, I have issues to catch the needed fields. On top of that the .xml may even be corrupt (incorrect hierarchical structure).
Xml:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<DMSContent format="LOL"/>
<Archive name="Adressdossier" id="52" osguid="43AAEC21AC6C40F1BEDB34D92512ED84"/>
<ObjectType name="Dokument" internal_name="CitizenFileDocument" id="262216" osguid="F287C984EB9E48BEA280BA46C305567C" type="DOCUMENT" modul="MULTIDOC"/>
<Rowset>
<Columns>
<Column name="Salutation" type="TEXT" ostype="X" size="50" otype="FOLDER"/>
<Column name="Name" type="TEXT" ostype="X" size="200" otype="FOLDER"/>
<Column name="FirstName" type="TEXT" ostype="X" size="100" otype="FOLDER"/>
<Column name="StreetNo" type="TEXT" ostype="X" size="100" otype="FOLDER"/>
<Column name="City" type="TEXT" ostype="X" size="150" otype="FOLDER"/>
<Column name="ZIP" type="TEXT" ostype="X" size="50" otype="FOLDER"/>
<Column name="Country" type="TEXT" ostype="X" size="50" otype="FOLDER"/>
<Column name="Birthday" type="DATE" ostype="D" size="50" otype="FOLDER"/>
<Column name="Filename" type="INTEGER" ostype="9" size="100" otype="FOLDER"/>
</Columns>
</Rowset>
<Rows>
<Row id="2538">
<Value>Mrs</Value>
<Value>Doe</Value>
<Value>Jane</Value>
<Value>Main Street 5</Value>
<Value>Ghost Town</Value>
<Value>5315</Value>
<Value>Switzerland</Value>
<Value>12.12.2017</Value>
<Value>jp4_B025DF7DBAFC49879103ECB8AE59C3A2.docx</Value>
</Row>
<Row id="2579">
<Value>Mr</Value>
<Value>Ding</Value>
<Value>Chavez</Value>
<Value>Sun Boulevard 3a</Value>
<Value>Alien City</Value>
<Value>4586</Value>
<Value>Germany</Value>
<Value>01.01.1980</Value>
<Value>jp4_DCA9345C93E84F1697668E6ACDC596C9.docx</Value>
</Row>
<Row id="2580">
<Value>Mr</Value>
<Value>Dale</Value>
<Value>Dick</Value>
<Value>Beach Avenue 13</Value>
<Value>Zombie Village</Value>
<Value>9513</Value>
<Value>Italy</Value>
<Value>09.11.1911</Value>
<Value>jp4_5DDBF2A05BD0421A8C53B0CC4EB64232.doc</Value>
</Row>
</Rows>
</Root>
The usually used MS-Sql code snippet, of course not working for this type of .xml-structure:
set ansi_nulls on;
declare @xmlfile xml;
select @xmlfile = bulkcolumn
from openrowset(bulk 'C:\Meta.xml', single_blob) x;
select
id = c.value('@id', 'int'),
Salutation = c.value('(Column[@k="Salutation"]/@v)[1]', 'varchar(60)'),
[Name] = c.value('(Column[@k="name"]/@v)[1]', 'varchar(100)'),
Birthday = c.value('(Column[@k="Birthday"]/@v)[1]', 'date'),
[Filename] = c.value('(Column[@k="Filename"]/@v)[1]', 'varchar(100)')
into #Meta --
from @xmlfile.nodes('/root/rows') as T(c);
set ansi_nulls off;
Thank you in advance for any help!
CodePudding user response:
SQL Server doesn't support fn::position()
or preceding-sibling::
syntaxes. But you can use a hack involving <<
to get the position of each node.
So we calculate the position of each Column
node, then push those values into the Value
lookups
SELECT
id = x2.Row.value('@id', 'int'),
Salutation = x2.Row.value('(Value[sql:column("ColIndex.Salutation")]/text())[1]', 'varchar(60)'),
[Name] = x2.Row.value('(Value[sql:column("ColIndex.Name" )]/text())[1]', 'varchar(100)'),
Birthday = x2.Row.value('(Value[sql:column("ColIndex.Birthday" )]/text())[1]', 'date'),
[Filename] = x2.Row.value('(Value[sql:column("ColIndex.Filename" )]/text())[1]', 'varchar(100)')
FROM @xml.nodes('/Root/Rowset/Columns') x1(Col)
CROSS APPLY (
SELECT
Salutation = x1.Col.value('let $c:= Column[@name="Salutation"][1] return count(Column[. << $c]) 1', 'int'),
[Name] = x1.Col.value('let $c:= Column[@name="Name"] [1] return count(Column[. << $c]) 1', 'int'),
Birthday = x1.Col.value('let $c:= Column[@name="Birthday"] [1] return count(Column[. << $c]) 1', 'int'),
[Filename] = x1.Col.value('let $c:= Column[@name="Filename"] [1] return count(Column[. << $c]) 1', 'int')
) ColIndex
CROSS APPLY @xml.nodes('/Root/Rows/Row') x2(Row);
CodePudding user response:
If you want to keep your current approach of importing the file, you can, with the following changes:
set ansi_nulls on;
declare @xmlfile xml;
select @xmlfile = bulkcolumn
from openrowset(bulk 'C:\Meta.xml', single_blob) x;
select
id = c.value('@id', 'int'),
Salutation = c.value('(Value[count(/Root/Rowset/Columns/Column[@name="Salutation"]/preceding-sibling::*) 1]/text())[1]', 'varchar(60)'),
[Name] = c.value('(Value[count(/Root/Rowset/Columns/Column[@name="Name"]/preceding-sibling::*) 1]/text())[1]', 'varchar(100)'),
Birthday = c.value('(Value[count(/Root/Rowset/Columns/Column[@name="Birthday"]/preceding-sibling::*) 1]/text())[1]', 'date'),
[Filename] = c.value('(Value[count(/Root/Rowset/Columns/Column[@name="Filename"]/preceding-sibling::*) 1]/text())[1]', 'varchar(100)')
into #Meta --
from @xmlfile.nodes('/Root/Rows/Row') as T(c);
set ansi_nulls off;
This finds the right <Value>
position by looking up the <Column>
of the given name and figuring out how many columns precede it. Not pretty, but effective.
If this is a one-off and/or you're certain of the column order, you can of course access the values directly.
Birthday = c.value('(Value[8]/text())[1]', 'varchar(60)'),