I have an XML file with a UTF-8 BOM in the beginning of the file, which hinders me from using existing code that reads UTF-8 files.
How can I remove the BOM from the XML file in an easy way?
Here I have a variable xmlfile
in Byte
type that I convert to string
. xmlfile
contains the entire XML file.
byte[] xmlfile = ((Byte[])myReader["xmlSQL"]);
string xmlstring = Encoding.UTF8.GetString(xmlfile);
CodePudding user response:
Great stuff DBC :) that worked well with your link. To fix my problem where i had a UTF-8 BOM tag in the beginning of my xml file. I simply added memorystream and streamreader, which automaticly cleanced the the xmlfile(htmlbytes) of BOM elements. Really easy to implement for existing code.
byte[] htmlbytes = ((Byte[])myReader["xmlMelding"]);
var memorystream = new MemoryStream(htmlbytes);
var s = new StreamReader(memorystream).ReadToEnd();
CodePudding user response:
Encoding.GetString()
has an overload that accepts an offset into the byte[]
array. Simply check if the array starts with a BOM, and if so then skip it when calling GetString()
, eg:
byte[] xmlfile = ((Byte[])myReader["xmlSQL"]);
int offset = 0;
if (xmlfile.Length >= 3 &&
xmlfile[0] == 0xEF &&
xmlfile[1] == 0xBB &&
xmlfile[1] == 0xBF)
{
offset = 3;
}
string xmlstring = Encoding.UTF8.GetString(xmlfile, offset, xmlfile.Length - offset);