Home > other >  How to extract multiple tag values from XML column in SQL Server
How to extract multiple tag values from XML column in SQL Server

Time:12-29

I would like to know how to extract multiple values from a single XML row, the problem is that this XML value somethimes have duplicate (name, id, email) tag childs, for example:

<foo>
    <name>
        Dacely Lara Camilo
    </name>
    <id>
        001-1942098-2
    </id>
    <email>
        [email protected]
    </email>
</foo>
<foo>
    <name>
        Alba Elvira Castro
    </name>
    <id>
        001-0327959-2
    </id>
    <email>
        [email protected]
    </email>
</foo>

Or somethimes the data in that column can be like this

<foo>
    <name>
        Nelson Antonio Jimenez
    </name>
    <id>
        001-0329459-3
    </id>
    <email>
        [email protected]
    </email>
</foo>
<foo>
    <name>
        Emelinda Serrano
    </name>
    <id>
        001-0261732-4
    </id>
    <email>
        [email protected]
    </email>
</foo>
<foo>
    <name>
        Nelson Antonio Jimenez
    </name>
    <id>
        001-0329259-3
    </id>
    <email>
        [email protected]
    </email>
</foo>
<foo>
    <name>
        Emelinda Serrano
    </name>
    <id>
        001-0268332-4
    </id>
    <email>
        [email protected]
    </email>
</foo>

And I whant all of then to be transpose to a single row just like this

enter image description here

My current code just extract the first pair, if it can help,

WITH BASEDATA (ID, SIGNATURE, X) AS (
    SELECT TOP 50
        A.ID_SIGNATURE,
        A.SIGNATURE,
        A.XML
    FROM DWH.DIM_CORE_SIGNATURE A
)SELECT
    ID,
    A.value('(id)[1]', 'nvarchar(max)') AS ID_SIGNATURE,
    A.value('(name)[1]', 'nvarchar(max)') AS NAME,
    A.value('(email)[1]', 'nvarchar(max)') AS EMAIL
FROM BASEDATA
CROSS APPLY X.nodes('//foo') AS SIGNATURE(A)

CodePudding user response:

Notable points:

  • .nodes('/foo') method has a better, more performant XPath expression.
  • It is better to use .value('(id/text())[1]',... for the same reason.
  • As @Lamu already suggested, it is better to use real data types instead of nvarchar(max) across the board.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, xmldata XML);
INSERT INTO @tbl (xmldata) VALUES
(N'<foo>
    <name>Dacely Lara Camilo</name>
    <id>001-1942098-2</id>
    <email>[email protected]</email>
</foo>
<foo>
    <name>Alba Elvira Castro</name>
    <id>001-0327959-2</id>
    <email>[email protected]</email>
</foo>')
, (N'<foo>
    <name>Nelson Antonio Jimenez</name>
    <id>001-0329459-3</id>
    <email>[email protected]</email>
</foo>
<foo>
    <name>Emelinda Serrano</name>
    <id>001-0261732-4</id>
    <email>[email protected]</email>
</foo>
<foo>
    <name>Nelson Antonio Jimenez</name>
    <id>001-0329259-3</id>
    <email>[email protected]</email>
</foo>
<foo>
    <name>Emelinda Serrano</name>
    <id>001-0268332-4</id>
    <email>[email protected]</email>
</foo>');
-- DDL and sample data population, end

SELECT ID,
    c.value('(id/text())[1]', 'char(13)') AS ID_SIGNATURE,
    c.value('(name/text())[1]', 'nvarchar(30)') AS NAME,
    c.value('(email/text())[1]', 'nvarchar(128)') AS EMAIL
FROM @tbl
    CROSS APPLY xmldata.nodes('/foo') AS t(c);

Output

 ---- --------------- ---------------------- ----------------------------- 
| ID | ID_SIGNATURE  |         NAME         |            EMAIL            |
 ---- --------------- ---------------------- ----------------------------- 
|  1 | 001-1942098-2 | Dacely Lara Camilo   | [email protected]    |
|  1 | 001-0327959-2 | Alba Elvira Castro   | [email protected] |
|  2 | 001-0329459-3 | Nelson Antonio Jimen | [email protected]         |
|  2 | 001-0261732-4 | Emelinda Serrano     | [email protected]          |
|  2 | 001-0329259-3 | Nelson Antonio Jimen | [email protected]          |
|  2 | 001-0268332-4 | Emelinda Serrano     | [email protected]          |
 ---- --------------- ---------------------- ----------------------------- 
  • Related