Home > Net >  Giving unique identifier to ambiguous parent level when reading XML data
Giving unique identifier to ambiguous parent level when reading XML data

Time:10-19

I have an XML document, which I've simplified for this question, in the following format:

<?xml version="1.0"?>
<xml>
    <aggregateddata>
        <aggregateddata>
            <item value="abcdefg1" name="id"/>
            <item value="1" name="dataSet"/>
            <item value="Aggregates" name="dataSetLabel"/>
            <item value="Physical Flow" name="indicator"/>
            <item value="day" name="periodType"/>
            <item value="2021-10-16T06:00:00 02:00" name="periodFrom"/>
            <item value="2021-10-17T06:00:00 02:00" name="periodTo"/>
            <item value="BE" name="countryKey"/>
            <item value="L-Zone" name="bzShort"/>
        </aggregateddata>
        <aggregateddata>
            <item value="abcdefg2" name="id"/>
            <item value="1" name="dataSet"/>
            <item value="Aggregates" name="dataSetLabel"/>
            <item value="Physical Flow" name="indicator"/>
            <item value="day" name="periodType"/>
            <item value="2021-10-16T06:00:00 02:00" name="periodFrom"/>
            <item value="2021-10-17T06:00:00 02:00" name="periodTo"/>
            <item value="UK" name="countryKey"/>
            <item value="L-Zone" name="bzShort"/>
        </aggregateddata>
    </aggregateddata>
</xml>

I want to be able to read this, ideally having each group of values on its own row. Turning it into something more SQL server friendly Like this:

id dataSet dataSetLabel indicator periodType periodFrom periodTo countryKey bzShort
abcdefg1 1 Aggregates Physical Flow day 2021-10-16T06:00:00 02:00 2021-10-17T06:00:00 02:00 BE L-Zone
abcdefg2 1 Aggregates Physical Flow day 2021-10-16T06:00:00 02:00 2021-10-17T06:00:00 02:00 UK L-Zone

To do this, I though I could read the XML using the following:

select
    XMLDataNodes.x.value('@name', 'varchar(50)') as FieldName,
    XMLDataNodes.x.value('@value', 'varchar(500)') as FieldValue
from 
    @XmlFile.nodes ('/xml/aggregateddata/aggregateddata/item') as XMLDataNodes(x)

And then use some sort of dynamic pivot to then turn the results into what I need.

The problem though, is there is nothing I can 'group by' - basically no value on the parent note to put next to them. I have tried something like this https://www.sqlservercentral.com/forums/topic/how-to-uniquely-number-parent-and-child-nodes-while-reading-an-xml-document, to add an identifier to each group, but it is incredibly slow on the actual full dataset:

select
    XMLDataNodes.x.value('@name', 'varchar(50)') as FieldName,
    XMLDataNodes.x.value('@value', 'varchar(500)') as FieldValue,
    XMLNodes.x.value('1 count(for $a in . return $a/../*[. << $a])','int') as parentID
from 
    @XmlFile.nodes ('/xml/aggregateddata/aggregateddata') as XMLNodes(x)
    cross apply XMLNodes.x.nodes('item') as XMLDataNodes(x)

Is there a way to either make the ID value much quicker, or pull the data out directly, without the need for this step (or the pivot afterwards)?

CodePudding user response:

Please try the following solution.

SQL Server's XQuery is very powerful.

The main idea is to use XPath with a predicate:

item[@name="..."]/@value

SQL

DECLARE @xml XML =
N'<xml>
    <aggregateddata>
        <aggregateddata>
            <item value="abcdefg1" name="id"/>
            <item value="1" name="dataSet"/>
            <item value="Aggregates" name="dataSetLabel"/>
            <item value="Physical Flow" name="indicator"/>
            <item value="day" name="periodType"/>
            <item value="2021-10-16T06:00:00 02:00" name="periodFrom"/>
            <item value="2021-10-17T06:00:00 02:00" name="periodTo"/>
            <item value="BE" name="countryKey"/>
            <item value="L-Zone" name="bzShort"/>
        </aggregateddata>
        <aggregateddata>
            <item value="abcdefg2" name="id"/>
            <item value="1" name="dataSet"/>
            <item value="Aggregates" name="dataSetLabel"/>
            <item value="Physical Flow" name="indicator"/>
            <item value="day" name="periodType"/>
            <item value="2021-10-16T06:00:00 02:00" name="periodFrom"/>
            <item value="2021-10-17T06:00:00 02:00" name="periodTo"/>
            <item value="UK" name="countryKey"/>
            <item value="L-Zone" name="bzShort"/>
        </aggregateddata>
    </aggregateddata>
</xml>';

SELECT c.value('(item/@value)[1]', 'varchar(50)') as id
    , c.value('(item[@name="dataSet"]/@value)[1]', 'varchar(500)') as dataSet
    , c.value('(item[@name="dataSetLabel"]/@value)[1]', 'varchar(500)') as dataSetLabel
    , c.value('(item[@name="indicator"]/@value)[1]', 'varchar(500)') as indicator
    , c.value('(item[@name="periodType"]/@value)[1]', 'varchar(500)') as periodType
    , c.value('(item[@name="periodFrom"]/@value)[1]', 'datetimeoffset(0)') as periodFrom
    , c.value('(item[@name="periodTo"]/@value)[1]', 'datetimeoffset(0)') as periodTo
    , c.value('(item[@name="countryKey"]/@value)[1]', 'CHAR(2)') as countryKey
    , c.value('(item[@name="bzShort"]/@value)[1]', 'VARCHAR(20)') as bzShort
FROM @xml.nodes('/xml/aggregateddata/aggregateddata') as t(c);

Output

 ---------- --------- -------------- --------------- ------------ ---------------------------- ---------------------------- ------------ --------- 
|    id    | dataSet | dataSetLabel |   indicator   | periodType |         periodFrom         |          periodTo          | countryKey | bzShort |
 ---------- --------- -------------- --------------- ------------ ---------------------------- ---------------------------- ------------ --------- 
| abcdefg1 |       1 | Aggregates   | Physical Flow | day        | 2021-10-16 06:00:00  02:00 | 2021-10-17 06:00:00  02:00 | BE         | L-Zone  |
| abcdefg2 |       1 | Aggregates   | Physical Flow | day        | 2021-10-16 06:00:00  02:00 | 2021-10-17 06:00:00  02:00 | UK         | L-Zone  |
 ---------- --------- -------------- --------------- ------------ ---------------------------- ---------------------------- ------------ --------- 

CodePudding user response:

Something like this might help (and should be faster than a dynamic XQuery, at least):

select 
    XmlDataNodes.x.value('(item[@name="id"]/@value)[1]', 'varchar(500)') as Id,
    Items.*
from 
    @XmlFile.nodes ('/xml/aggregateddata/aggregateddata') as XMLDataNodes(x)
    cross apply (
       select
         ItemNodes.x.value('@name', 'varchar(50)') as FieldName,
         ItemNodes.x.value('@value', 'varchar(500)') as FieldValue
       from XMLDataNodes.x.nodes('item') ItemNodes(x)
    ) Items

Result:

Id FieldName FieldValue
abcdefg1 id abcdefg1
abcdefg1 dataSet 1
abcdefg1 dataSetLabel Aggregates
abcdefg1 indicator Physical Flow
... ... ...
  • Related