I have the following XML file which i want to load into SQL:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns4:Dataview1 xmlns="ux:no::ehe:v5:actual:aver" xmlns:ns4="ux:no:ehe:v5:move" xmlns:ns2="ux:no:ehe:v5:centre" xmlns:ns5="ux:no:ehe:v5:tran">
<ns4:Content versionCode="16000">
<ns4:gen start="1961-07-01" end="1961-07-01">
</ns4:gen>
<Weight unit="KG">30.0000</Weight>
<ns4:gen start="2017-09-19">
</ns4:gen>
<ns4:gen start="1961-07-02" end="2016-09-30">
<ns5:translation>
<ns2:En>Example Data English 234354</ns2:En>
</ns5:translation>
<Type>Circle</Type>
<Weight unit="KG">10.0000</Weight>
</ns4:gen>
<ns4:gen start="2016-10-01" end="2017-09-18">
<ns5:translation>
<ns2:De>Beispieldaten Deutschland 23443</ns2:De>
<ns2:En>Example Data English 23443</ns2:En>
</ns5:translation>
<Weight unit="KG">20.0000</Weight>
</ns4:gen>
</ns4:Content>
</ns4:Dataview1>
Following code is used:
CREATE TABLE tbl (
ID INT IDENTITY(1, 1) PRIMARY KEY,
XmlColumn XML
);
INSERT INTO tbl(XmlColumn)
SELECT * FROM OPENROWSET(BULK N'C:\Data\demoset.xml', SINGLE_BLOB) AS x;
WITH XMLNAMESPACES(DEFAULT 'ux:no::ehe:v5:actual:aver',
'ux:no:ehe:v5:tran' as ns5,
'ux:no:ehe:v5:move' AS ns4,
'ux:no:ehe:v5:cat:fill' as ns3,
'ux:no:ehe:v5:centre' as ns2)
SELECT c.value('@versionCode', 'VARCHAR(100)') as versionCode,
x.value('@start', 'DATE') as Start_date,
x.value('@end', 'DATE') as End_date,
x.value('(ns5:translation/ns2:En)[1]', 'VARCHAR(100)') as Trans_EN,
x.value('Type[1]', 'VARCHAR(100)') as Typed,
x.value('(Weight/unit)[1]', 'VARCHAR(100)') as Unitdata,
x.value('Weight[1]', 'VARCHAR(100)') as Weightdata
INTO dbo.testtbl
FROM tbl
CROSS APPLY XmlColumn.nodes('/ns4:Dataview1/ns4:Content') AS t1(c)
OUTER APPLY t1.c.nodes('ns4:gen') AS t2(x);
I have three questions:
1.) I don't understand why it is not possible to use @ns2:En, @Type, @Weight. Therefore, why do i need to place [1] behind data ns2:En, Type & Weight?
2.) Why is '(Weight/unit)[1]' not working?
3.) If I change outer apply to cross apply results are the same. I think i need to use outer apply since it is a left outer join in my opinion, but I don't understand completely. In my previous topic cross apply is used in the answer. Slow XML import with SQL server
CodePudding user response:
-
.value
always needs a singleton: in other words it must be statically known (upfront) that there will be exactly one node to retrieve. This is normally done using a predicate[1]
to retrieve the first node.- Retrieving a whole node using
.value
implicitly takes thetext()
node (of which there may be more than one), however it is better and more performant to specify it explicitly like(Type/text())[1]
- In the case of attributes (see 2.), there is always one unique attribute, so this doesn't need
[1]
. But this only applies if you are retrieving a top-level attribute. If you go down a level then you may have multiple attributes, again requiring[1]
.
- Attributes are retrieved using
@
, so you need(Weight/@unit)[1]
. - The
.nodes
function shreds the XML into separate rows.CROSS APPLY
implies that you expect rows and to act like anINNER JOIN
, whereasOUTER APPLY
implies that there may not be rows but you still want nulls, like aLEFT JOIN
. Up to you as to what you expect.
WITH XMLNAMESPACES(DEFAULT 'ux:no::ehe:v5:actual:aver',
'ux:no:ehe:v5:tran' as ns5,
'ux:no:ehe:v5:move' AS ns4,
'ux:no:ehe:v5:cat:fill' as ns3,
'ux:no:ehe:v5:centre' as ns2)
SELECT c.value('@versionCode', 'VARCHAR(100)') as versionCode,
x.value('@start', 'DATE') as Start_date,
x.value('@end', 'DATE') as End_date,
x.value('(ns5:translation/ns2:En/text())[1]', 'VARCHAR(100)') as Trans_EN,
x.value('(Type/text())[1]', 'VARCHAR(100)') as Typed,
x.value('(Weight/@unit)[1]', 'VARCHAR(20)') as Unitdata,
x.value('(Weight/text())[1]', 'decimal(18,9)') as Weightdata
FROM tbl
CROSS APPLY tbl.XmlColumn.nodes('/ns4:Dataview1/ns4:Content') AS t1(c)
OUTER APPLY t1.c.nodes('ns4:gen') AS t2(x);
You don't need to create a table just to use OPENROWSET
, you can feed it straight in:
WITH XMLNAMESPACES(DEFAULT 'ux:no::ehe:v5:actual:aver',
'ux:no:ehe:v5:tran' as ns5,
'ux:no:ehe:v5:move' AS ns4,
'ux:no:ehe:v5:cat:fill' as ns3,
'ux:no:ehe:v5:centre' as ns2)
SELECT c.value('@versionCode', 'VARCHAR(100)') as versionCode,
x.value('@start', 'DATE') as Start_date,
x.value('@end', 'DATE') as End_date,
x.value('(ns5:translation/ns2:En/text())[1]', 'VARCHAR(100)') as Trans_EN,
x.value('(Type/text())[1]', 'VARCHAR(100)') as Typed,
x.value('(Weight/@unit)[1]', 'VARCHAR(20)') as Unitdata,
x.value('(Weight/text())[1]', 'decimal(18,9)') as Weightdata
FROM OPENROWSET(BULK N'C:\Data\demoset.xml', SINGLE_BLOB) AS blk(col)
CROSS APPLY (VALUES (
CAST(blk.col AS xml)
) ) AS t(XmlColumn)
CROSS APPLY t.XmlColumn.nodes('/ns4:Dataview1/ns4:Content') AS t1(c)
OUTER APPLY t1.c.nodes('ns4:gen') AS t2(x);