I am trying to pull data from an XML file using SQL / XPath
For my example lets say that the table is called [Books] and the column that contains the XML is called [XML]
Example XML:
<Books>
<Book>
<Title>Apology</Title>
<Author>Socrates</Author>
</Book>
<Book>
<Title>Republic</Title>
<Author>Plato</Author>
</Book>
<Book>
<Title>Politics</Title>
<Author>Aristotle</Author>
</Book>
</Books>
Desired Result:
Title | Author |
---|---|
Apology | Socrates |
Republic | Plato |
Politics | Aristotle |
I am able to pull the distinct 'Titles' with a query like this:
SELECT
a.x.value('text()[1]', 'VARCHAR(100)') AS 'Title'
FROM [dbo].[Books]
CROSS APPLY [XML].nodes('//Books/Book/Title') AS a (x)
Result:
Title |
---|
Apology |
Republic |
Politics |
However, I am unable to pull the other nodes while maintaining their relationship with eachother.
For example, if i add a second 'Cross Apply':
SELECT
a.x.value('text()[1]', 'VARCHAR(100)') AS 'Title',
b.x.value('text()[1]', 'VARCHAR(100)') AS 'Author'
FROM [dbo].[Books]
CROSS APPLY [XML].nodes('//Books/Book/Title') AS a (x)
CROSS APPLY [XML].nodes('//Books/Book/Author') AS b (x)
The result will be something like this:
Title | Author |
---|---|
Apology | Socrates |
Apology | Plato |
Apology | Aristotle |
Republic | Plato |
Republic | Socrates |
Republic | Aristotle |
Politics | Aristotle |
Politics | Plato |
Politics | Socrates |
I have also tried to pull both nodes using one cross apply:
SELECT
a.x.value('(//Title/text())[1]', 'VARCHAR(100)') AS 'Title',
a.x.value('(//Author/text())[1]', 'VARCHAR(100)') AS 'Author'
FROM [dbo].[Books]
CROSS APPLY [XML].nodes('//Books/Book') AS a (x)
When i try this i just get the first node duplicated:
Title | Author |
---|---|
Apology | Socrates |
Apology | Socrates |
Apology | Socrates |
Is there any way i can preserve the relationship between these 2 nodes?
Thank you
CodePudding user response:
In this particular instance, you don't need to correlate anything. You can just shred the Books/Book
node, then select each value separately
SELECT
x1.Book.value('(Title/text())[1]', 'VARCHAR(100)') AS Title,
x1.Book.value('(Author/text())[1]', 'VARCHAR(100)') AS Author
FROM dbo.Books b
CROSS APPLY b.XML.nodes('Books/Book') AS x1 (book);
If you really need to correlate them (for example if there are multiple Title
nodes), you can just pass the result of one .nodes
to the next
SELECT
x2.Title.value('text()[1]', 'VARCHAR(100)') AS Title,
x3.Author.value('text()[1]', 'VARCHAR(100)') AS Author
FROM dbo.Books b
CROSS APPLY b.XML.nodes('Books/Book') AS x1 (Book)
CROSS APPLY x1.Book.nodes('Title') AS x2 (Title)
CROSS APPLY x1.Book.nodes('Author') AS x3 (Author);
Also note that the //
all-descendants axis is slow, and you should favour the /
direct-child axis if possible.