Home > Software design >  XML file that starts with a # (hash)
XML file that starts with a # (hash)

Time:10-07

I am trying to load XML files with XDocument.Load(stream), but some files are declared as

#<?xml version="1.0" encoding="utf-16"?>
<root>
</root>

Because of the starting # I am getting an XmLException with the message

Data at the root level is invalid. Line 1, position 1.

To handle this anomaly, I have this fall back idea:

XDocument? xDoc = null;
var fileName = "C:\dummy.xml";

try
{
    await using var fileStream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
    xDoc = await XDocument.LoadAsync(fileStream, LoadOptions.None, CancellationToken.None).ConfigureAwait(false);
}
catch (Exception e)
{
    //my idea would be, if an exception occured
    //- read the file into memory 
    //- verify if it starts with '#'
    //- drop the '#'
}

Is there a better way to handle files with the #?

Is the # some XML specification I am unaware of?

CodePudding user response:

I would probably go through a separate phase beforehand, copying files without their first byte, where appropriate. For example:

foreach (var file in Directory.GetFiles("*.xml"))
{
    // Note: this assumes all XML files are UTF-8.
    using (var input = File.OpenRead(file))
    {
        int firstByte = input.ReadByte();
        if (firstByte != '#')
        {
            continue;
        }
        string tmp = file   "-tmp";
        using (var copy = File.Create(tmp))
        {
            input.CopyTo(copy);
        }
        // Close the stream so we can rename.
        input.Close();
        // Move the original file to a backup, and the temporary to the original name
        File.Move(file, Path.ChangeExtension(file, ".bak"));
        File.Move(tmp, file);
    }    
}
  • Related