I have an issue with parsing text with special characters from XML in SQL Server.
Let's say I have a XML file Sample.xml
which has the following data:
<People>
<Person FirstName="Adam"
LastName="Smith"
Age="44"
Weight="178">
<Texts>
<Text Country="US"
Language="EN"
TextType="1"><div>First sentence to retrieve.</div></Text>
<Text Country="GB"
Language="EN"
TextType="2"><div>Second sentence to retrieve.</div></Text>
</Texts>
</Person>
</People>
I prepared the following SQL script which can parse everything except two sentences in the <TextType>
attribute:
- First sentence to retrieve
- Second sentence to retrieve
DECLARE @x XML
SELECT @x = f FROM OPENROWSET(BULK 'C:\Sample.xml', single_blob) AS C(f)
DECLARE @hdoc int
EXEC sp_xml_preparedocument @hdoc OUTPUT, @x
SELECT * FROM OPENXML (@hdoc, '/People/Person/Texts/Text')
WITH (
FirstName varchar(max) '../../@FirstName'
, LastName varchar(max) '../../@LastName'
, Age varchar(max) '../../@Age'
, [Weight] varchar(max) '../../@Weight'
, Country varchar(max) '@Country'
, [Language] varchar(max) '@Language'
, TextType varchar(max) '@TextType'
)
EXEC sp_xml_removedocument @hdoc
Could you please help me to add the column with the sentences mentioned above?
CodePudding user response:
OPENXML
is old and basically deprecated, it has numerous issues.
You should use the newer XQuery functions .nodes
and .value
to retrieve your data.
Your primary issue is that you have XML stored as string inside another XML. You need to retrieve it as nvarchar(max)
, then cast it using TRY_CONVERT
.
SELECT
FirstName = x1.Person.value('@FirstName', 'varchar(100)'),
LastName = x1.Person.value('@LastName' , 'varchar(100)'),
Age = x1.Person.value('@Age' , 'int'),
Weight = x1.Person.value('@Weight' , 'decimal(9,5)'),
Country = x2.Text.value('@Country' , 'char(2)'),
[Language] = x2.Text.value('@Language', 'char(2)'),
TextType = x2.Text.value('@TextType', 'int'),
value = v.InnerXml.value('(div/text())[1]','nvarchar(max)')
FROM @x.nodes('People/Person') x1(Person)
CROSS APPLY x1.Person.nodes('Texts/Text') x2(Text)
CROSS APPLY (VALUES( TRY_CONVERT(xml, x2.Text.value('text()[1]','nvarchar(max)')) )) v(InnerXml);
Note the way there are two calls to .nodes
, and one feeds into the next.
You can even feed this in straight from OPENROWSET
SELECT
FirstName = x1.Person.value('@FirstName', 'varchar(100)'),
LastName = x1.Person.value('@LastName' , 'varchar(100)'),
Age = x1.Person.value('@Age' , 'int'),
Weight = x1.Person.value('@Weight' , 'decimal(9,5)'),
Country = x2.Text.value('@Country' , 'char(2)'),
[Language] = x2.Text.value('@Language', 'char(2)'),
TextType = x2.Text.value('@TextType', 'int'),
value = v.InnerXml.value('(div/text())[1]','nvarchar(max)')
FROM OPENROWSET(BULK 'C:\Sample.xml', single_blob) AS C(f)
CROSS APPLY (VALUES( TRY_CONVERT(xml, C.f) )) C2(AsXml)
CROSS APPLY C2.AsXml.nodes('People/Person') x1(Person)
CROSS APPLY x1.Person.nodes('Texts/Text') x2(Text)
CROSS APPLY (VALUES( TRY_CONVERT(xml, x2.Text.value('text()[1]','nvarchar(max)')) )) v(InnerXml);
I recommend you fix whatever is generating this XML, ideally you would pass through the inner XML without stringifying it.