Home > other >  UTF-8 remove BOM
UTF-8 remove BOM

Time:11-09

I have an XML file with a UTF-8 BOM in the beginning of the file, which hinders me from using existing code that reads UTF-8 files.

How can I remove the BOM from the XML file in an easy way?

Here I have a variable xmlfile in Byte type that I convert to string. xmlfile contains the entire XML file.

 byte[] xmlfile = ((Byte[])myReader["xmlSQL"]);

 string xmlstring = Encoding.UTF8.GetString(xmlfile);

CodePudding user response:

Great stuff DBC :) that worked well with your link. To fix my problem where i had a UTF-8 BOM tag in the beginning of my xml file. I simply added memorystream and streamreader, which automaticly cleanced the the xmlfile(htmlbytes) of BOM elements. Really easy to implement for existing code.

 byte[] htmlbytes = ((Byte[])myReader["xmlMelding"]);
 var memorystream = new MemoryStream(htmlbytes);
 var s = new StreamReader(memorystream).ReadToEnd();

CodePudding user response:

Encoding.GetString() has an overload that accepts an offset into the byte[] array. Simply check if the array starts with a BOM, and if so then skip it when calling GetString(), eg:

byte[] xmlfile = ((Byte[])myReader["xmlSQL"]);
int offset = 0;

if (xmlfile.Length >= 3 &&
    xmlfile[0] == 0xEF &&
    xmlfile[1] == 0xBB &&
    xmlfile[1] == 0xBF)
{
    offset  = 3;
}

string xmlstring = Encoding.UTF8.GetString(xmlfile, offset, xmlfile.Length - offset);
  • Related