Bulk import xml file to SQL server-CodePudding

Issue: I have a single .xml file containing 13.5k of data sets and need to import it into SSMS. Unfortunatly, it contain a style I never met before; first a column declaration part, then the data part without any specific column names. With thus, I have issues to catch the needed fields. On top of that the .xml may even be corrupt (incorrect hierarchical structure).

Xml:

<?xml version="1.0" encoding="UTF-8"?>
<Root>
    <DMSContent format="LOL"/>
    <Archive name="Adressdossier" id="52" osguid="43AAEC21AC6C40F1BEDB34D92512ED84"/>
    <ObjectType name="Dokument" internal_name="CitizenFileDocument" id="262216" osguid="F287C984EB9E48BEA280BA46C305567C" type="DOCUMENT" modul="MULTIDOC"/>

    <Rowset>
        <Columns>
            <Column name="Salutation" type="TEXT" ostype="X" size="50" otype="FOLDER"/>
            <Column name="Name" type="TEXT" ostype="X" size="200" otype="FOLDER"/>
            <Column name="FirstName" type="TEXT" ostype="X" size="100" otype="FOLDER"/>
            <Column name="StreetNo" type="TEXT" ostype="X" size="100" otype="FOLDER"/>
            <Column name="City" type="TEXT" ostype="X" size="150" otype="FOLDER"/>
            <Column name="ZIP" type="TEXT" ostype="X" size="50" otype="FOLDER"/>
            <Column name="Country" type="TEXT" ostype="X" size="50" otype="FOLDER"/>
            <Column name="Birthday" type="DATE" ostype="D" size="50" otype="FOLDER"/>
            <Column name="Filename" type="INTEGER" ostype="9" size="100" otype="FOLDER"/>
        </Columns>
    </Rowset>
    
    <Rows>
        <Row id="2538">
            <Value>Mrs</Value>
            <Value>Doe</Value>
            <Value>Jane</Value>
            <Value>Main Street 5</Value>        
            <Value>Ghost Town</Value>
            <Value>5315</Value>
            <Value>Switzerland</Value>      
            <Value>12.12.2017</Value>
            <Value>jp4_B025DF7DBAFC49879103ECB8AE59C3A2.docx</Value>
        </Row>
        
        <Row id="2579">
            <Value>Mr</Value>
            <Value>Ding</Value>
            <Value>Chavez</Value>
            <Value>Sun Boulevard 3a</Value>     
            <Value>Alien City</Value>
            <Value>4586</Value>
            <Value>Germany</Value>      
            <Value>01.01.1980</Value>
            <Value>jp4_DCA9345C93E84F1697668E6ACDC596C9.docx</Value>
        </Row>
        
        <Row id="2580">
            <Value>Mr</Value>
            <Value>Dale</Value>
            <Value>Dick</Value>
            <Value>Beach Avenue 13</Value>      
            <Value>Zombie Village</Value>
            <Value>9513</Value>
            <Value>Italy</Value>        
            <Value>09.11.1911</Value>
            <Value>jp4_5DDBF2A05BD0421A8C53B0CC4EB64232.doc</Value>
        </Row>
    </Rows>
</Root>

The usually used MS-Sql code snippet, of course not working for this type of .xml-structure:

set ansi_nulls on;
declare @xmlfile xml;

select @xmlfile = bulkcolumn
from  openrowset(bulk 'C:\Meta.xml', single_blob) x;

select 
       id           =      c.value('@id', 'int'),
       Salutation   =      c.value('(Column[@k="Salutation"]/@v)[1]', 'varchar(60)'),
       [Name]       =      c.value('(Column[@k="name"]/@v)[1]', 'varchar(100)'),
       Birthday     =      c.value('(Column[@k="Birthday"]/@v)[1]', 'date'),
       [Filename]   =      c.value('(Column[@k="Filename"]/@v)[1]', 'varchar(100)')

into #Meta --

from @xmlfile.nodes('/root/rows') as T(c);

set ansi_nulls off;

Thank you in advance for any help!

CodePudding user response：

SQL Server doesn't support fn::position() or preceding-sibling:: syntaxes. But you can use a hack involving << to get the position of each node.

So we calculate the position of each Column node, then push those values into the Value lookups

SELECT
  id         = x2.Row.value('@id', 'int'),
  Salutation = x2.Row.value('(Value[sql:column("ColIndex.Salutation")]/text())[1]', 'varchar(60)'),
  [Name]     = x2.Row.value('(Value[sql:column("ColIndex.Name"      )]/text())[1]', 'varchar(100)'),
  Birthday   = x2.Row.value('(Value[sql:column("ColIndex.Birthday"  )]/text())[1]', 'date'),
  [Filename] = x2.Row.value('(Value[sql:column("ColIndex.Filename"  )]/text())[1]', 'varchar(100)')

FROM @xml.nodes('/Root/Rowset/Columns') x1(Col)
CROSS APPLY (
    SELECT
       Salutation = x1.Col.value('let $c:= Column[@name="Salutation"][1] return count(Column[. << $c])   1', 'int'),
       [Name]     = x1.Col.value('let $c:= Column[@name="Name"]      [1] return count(Column[. << $c])   1', 'int'),
       Birthday   = x1.Col.value('let $c:= Column[@name="Birthday"]  [1] return count(Column[. << $c])   1', 'int'),
       [Filename] = x1.Col.value('let $c:= Column[@name="Filename"]  [1] return count(Column[. << $c])   1', 'int')
) ColIndex

CROSS APPLY @xml.nodes('/Root/Rows/Row') x2(Row);

db<>fiddle

CodePudding user response：

If you want to keep your current approach of importing the file, you can, with the following changes:

set ansi_nulls on;
declare @xmlfile xml;

select @xmlfile = bulkcolumn
from  openrowset(bulk 'C:\Meta.xml', single_blob) x;

select 
       id           =      c.value('@id', 'int'),
       Salutation   =      c.value('(Value[count(/Root/Rowset/Columns/Column[@name="Salutation"]/preceding-sibling::*)   1]/text())[1]', 'varchar(60)'),
       [Name]       =      c.value('(Value[count(/Root/Rowset/Columns/Column[@name="Name"]/preceding-sibling::*)   1]/text())[1]', 'varchar(100)'),
       Birthday     =      c.value('(Value[count(/Root/Rowset/Columns/Column[@name="Birthday"]/preceding-sibling::*)   1]/text())[1]', 'date'),
       [Filename]   =      c.value('(Value[count(/Root/Rowset/Columns/Column[@name="Filename"]/preceding-sibling::*)   1]/text())[1]', 'varchar(100)')

into #Meta --

from @xmlfile.nodes('/Root/Rows/Row') as T(c);

set ansi_nulls off;

This finds the right <Value> position by looking up the <Column> of the given name and figuring out how many columns precede it. Not pretty, but effective.

If this is a one-off and/or you're certain of the column order, you can of course access the values directly.

       Birthday     =      c.value('(Value[8]/text())[1]', 'varchar(60)'),