TL;DR - in .Net and XmlDocument/XDocument is there an easy way (XPath?) to find CDATA
nodes, so they can be removed and the contents encoded?
Details...
My system has lots of situations where it builds XML strings manually (e.g. string concatination, rather than building via XmlDocument or XDocument) which could contain multiple <![CDATA[...]]>
nodes (which could appear at any level of the structure)... e.g.
<data><one><![CDATA[ab&cd]]></one><two><inner><![CDATA[xy<z]]></inner></two></data>
When storing this data in a SQLServer XML column, the <![CDATA[..]]>
is automatically removed and the inner text encoded... this is standard for SQLServer which doesn't "do" CDATA
.
My issue is that I have complex code that takes two instances of a class, and audit-trails differences between them... one or more could be a string property containing XML.
This results in a mismatch (and therefore an audit-trail entry) when nothing is actually changing, because the code creates one format of XML and SQLServer returns a different form, e.g...
// Manually generated XML string...
<data><one><![CDATA[ab&cd]]></one><two><inner><![CDATA[xy<z]]></inner></two></data>
// SQLServer returned string...
<data><one>ab&cd</one><two><inner>xy<z</inner></two></data>
Is there an easy way in .Net to process the manually generated XML and convert each CDATA
node into it's encoded version, so I can compare the string to the one returned by SQLServer?
Is there a SelectNodes
XPath that would find all those elements?
(And before anybody states it, the obvious solution is to not use CDATA
in the manual creation of the XML in the first place... however, this is not possible due to the sheer number of instances.)
CodePudding user response:
Easy with one foreach
loop and ReplaceChild
:
using System.Xml;
var doc = new XmlDocument();
doc.LoadXml(@"<data><one><![CDATA[ab&cd]]></one><two><inner><![CDATA[xy<z]]></inner></two><three><inner>a < b</inner></three></data>");
foreach (var cdata in doc.SelectNodes("//text()").OfType<XmlCDataSection>())
{
cdata.ParentNode.ReplaceChild(doc.CreateTextNode(cdata.Data), cdata);
}
Console.WriteLine(doc.OuterXml);
Outputs
<data><one>ab&cd</one><two><inner>xy<z</inner></two><three><inner>a < b</inner></three></data>
Another option would be to run the XML through an XSLT identity transformation with XslCompiledTransform and e.g.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>