I have an XML document, which I've simplified for this question, in the following format:
<?xml version="1.0"?>
<xml>
<aggregateddata>
<aggregateddata>
<item value="abcdefg1" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00 02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00 02:00" name="periodTo"/>
<item value="BE" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
<aggregateddata>
<item value="abcdefg2" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00 02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00 02:00" name="periodTo"/>
<item value="UK" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
</aggregateddata>
</xml>
I want to be able to read this, ideally having each group of values on its own row. Turning it into something more SQL server friendly Like this:
id | dataSet | dataSetLabel | indicator | periodType | periodFrom | periodTo | countryKey | bzShort |
---|---|---|---|---|---|---|---|---|
abcdefg1 | 1 | Aggregates | Physical Flow | day | 2021-10-16T06:00:00 02:00 | 2021-10-17T06:00:00 02:00 | BE | L-Zone |
abcdefg2 | 1 | Aggregates | Physical Flow | day | 2021-10-16T06:00:00 02:00 | 2021-10-17T06:00:00 02:00 | UK | L-Zone |
To do this, I though I could read the XML using the following:
select
XMLDataNodes.x.value('@name', 'varchar(50)') as FieldName,
XMLDataNodes.x.value('@value', 'varchar(500)') as FieldValue
from
@XmlFile.nodes ('/xml/aggregateddata/aggregateddata/item') as XMLDataNodes(x)
And then use some sort of dynamic pivot to then turn the results into what I need.
The problem though, is there is nothing I can 'group by' - basically no value on the parent note to put next to them. I have tried something like this https://www.sqlservercentral.com/forums/topic/how-to-uniquely-number-parent-and-child-nodes-while-reading-an-xml-document, to add an identifier to each group, but it is incredibly slow on the actual full dataset:
select
XMLDataNodes.x.value('@name', 'varchar(50)') as FieldName,
XMLDataNodes.x.value('@value', 'varchar(500)') as FieldValue,
XMLNodes.x.value('1 count(for $a in . return $a/../*[. << $a])','int') as parentID
from
@XmlFile.nodes ('/xml/aggregateddata/aggregateddata') as XMLNodes(x)
cross apply XMLNodes.x.nodes('item') as XMLDataNodes(x)
Is there a way to either make the ID value much quicker, or pull the data out directly, without the need for this step (or the pivot afterwards)?
CodePudding user response:
Please try the following solution.
SQL Server's XQuery is very powerful.
The main idea is to use XPath with a predicate:
item[@name="..."]/@value
SQL
DECLARE @xml XML =
N'<xml>
<aggregateddata>
<aggregateddata>
<item value="abcdefg1" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00 02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00 02:00" name="periodTo"/>
<item value="BE" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
<aggregateddata>
<item value="abcdefg2" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00 02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00 02:00" name="periodTo"/>
<item value="UK" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
</aggregateddata>
</xml>';
SELECT c.value('(item/@value)[1]', 'varchar(50)') as id
, c.value('(item[@name="dataSet"]/@value)[1]', 'varchar(500)') as dataSet
, c.value('(item[@name="dataSetLabel"]/@value)[1]', 'varchar(500)') as dataSetLabel
, c.value('(item[@name="indicator"]/@value)[1]', 'varchar(500)') as indicator
, c.value('(item[@name="periodType"]/@value)[1]', 'varchar(500)') as periodType
, c.value('(item[@name="periodFrom"]/@value)[1]', 'datetimeoffset(0)') as periodFrom
, c.value('(item[@name="periodTo"]/@value)[1]', 'datetimeoffset(0)') as periodTo
, c.value('(item[@name="countryKey"]/@value)[1]', 'CHAR(2)') as countryKey
, c.value('(item[@name="bzShort"]/@value)[1]', 'VARCHAR(20)') as bzShort
FROM @xml.nodes('/xml/aggregateddata/aggregateddata') as t(c);
Output
---------- --------- -------------- --------------- ------------ ---------------------------- ---------------------------- ------------ ---------
| id | dataSet | dataSetLabel | indicator | periodType | periodFrom | periodTo | countryKey | bzShort |
---------- --------- -------------- --------------- ------------ ---------------------------- ---------------------------- ------------ ---------
| abcdefg1 | 1 | Aggregates | Physical Flow | day | 2021-10-16 06:00:00 02:00 | 2021-10-17 06:00:00 02:00 | BE | L-Zone |
| abcdefg2 | 1 | Aggregates | Physical Flow | day | 2021-10-16 06:00:00 02:00 | 2021-10-17 06:00:00 02:00 | UK | L-Zone |
---------- --------- -------------- --------------- ------------ ---------------------------- ---------------------------- ------------ ---------
CodePudding user response:
Something like this might help (and should be faster than a dynamic XQuery, at least):
select
XmlDataNodes.x.value('(item[@name="id"]/@value)[1]', 'varchar(500)') as Id,
Items.*
from
@XmlFile.nodes ('/xml/aggregateddata/aggregateddata') as XMLDataNodes(x)
cross apply (
select
ItemNodes.x.value('@name', 'varchar(50)') as FieldName,
ItemNodes.x.value('@value', 'varchar(500)') as FieldValue
from XMLDataNodes.x.nodes('item') ItemNodes(x)
) Items
Result:
Id | FieldName | FieldValue |
---|---|---|
abcdefg1 | id | abcdefg1 |
abcdefg1 | dataSet | 1 |
abcdefg1 | dataSetLabel | Aggregates |
abcdefg1 | indicator | Physical Flow |
... | ... | ... |