Home > Software design >  Conversion of the special characters while adding it to the XML innertext in C#
Conversion of the special characters while adding it to the XML innertext in C#

Time:05-24

While writing the inner text I need to use the hexadecimal code for special characters, but not able to add it. I tried some encoding changes but it is not working. I need output like

–CO–OR instead of "–CO–OR"

"+" instead of " "

Code which I am trying to convert is provided below.

else
{
  //convertedStr = System.Net.WebUtility.HtmlDecode(runText);
  Encoding iso = Encoding.Default; 
  Encoding utf8 = Encoding.Unicode;
  byte[] utfBytes = utf8.GetBytes(runText);
  byte[] isoBytes = Encoding.Convert(iso, utf8, utfBytes);
  string msg = iso.GetString(isoBytes);    
  eqnPartElm = clsGlobal.XMLDoc.CreateElement("inf");
  eqnPartElm.InnerText = msg;
  eqnElm.AppendChild(eqnPartElm);   
}

CodePudding user response:

Escaping of Unicode characters is not controlled by XmlDocument. Instead, XmlWriter will escape characters not supported by the current encoding, as specified by XmlWriterSettings.Encoding, at the time the document is written to a stream. If you want all "special characters" such as the En Dash to be escaped, choose a very restrictive encoding such as Encoding.ASCII.

To do this easily, create the following extension methods:

public static class XmlSerializationHelper
{
    public static string GetOuterXml(this XmlNode node, bool indent = false, Encoding encoding = null, bool omitXmlDeclaration = false)
    {
        if (node == null)
            return null;
        using var stream = new MemoryStream();
        node.Save(stream, indent : indent, encoding : encoding, omitXmlDeclaration : omitXmlDeclaration, closeOutput : false);
        stream.Position = 0;
        using var reader = new StreamReader(stream);
        return reader.ReadToEnd();
    }

    public static void Save(this XmlNode node, Stream stream, bool indent = false, Encoding encoding = null, bool omitXmlDeclaration = false, bool closeOutput = true) =>
        node.Save(stream, new XmlWriterSettings
                  {
                      Indent = indent,
                      Encoding = encoding,
                      OmitXmlDeclaration = omitXmlDeclaration,
                      CloseOutput = closeOutput,
                  });

    public static void Save(this XmlNode node, Stream stream, XmlWriterSettings settings)
    {
        using (var xmlWriter = XmlWriter.Create(stream, settings))
        {
            node.WriteTo(xmlWriter);
        }
    }
}

And now you will be able to do the following to serialize an XmlDocument to a string with non-ASCII characters escaped:

// Construct your XmlDocument (not shown in the question)
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml("<Root></Root>");
var eqnPartElm = xmlDoc.CreateElement("inf");
xmlDoc.DocumentElement.AppendChild(eqnPartElm);

// Add some non-ASCII text (here – is an En Dash character).
eqnPartElm.InnerText = "–CO–OR";

// Output to XML and escape all non-ASCII characters.
var xml = xmlDoc.GetOuterXml(indent : true, encoding : Encoding.ASCII, omitXmlDeclaration : true);

To serialize to a Stream, do:

using (var stream = new FileStream(fileName, FileMode.OpenOrCreate))
{
    xmlDoc.Save(stream, indent : true, encoding : Encoding.ASCII, omitXmlDeclaration : true);
}

And the following XML will be created:

<Root>
  <inf>&#x2013;CO&#x2013;OR</inf>
</Root>

Note you must use the new XmlWriter not the old XmlTextWriter as the latter does not support replacing unsupported characters with escaped fallbacks.

Demo fiddle here.

  • Related