I am trying to load XML files with XDocument.Load(stream)
, but some files are declared as
#<?xml version="1.0" encoding="utf-16"?>
<root>
</root>
Because of the starting #
I am getting an XmLException
with the message
Data at the root level is invalid. Line 1, position 1.
To handle this anomaly, I have this fall back idea:
XDocument? xDoc = null;
var fileName = "C:\dummy.xml";
try
{
await using var fileStream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
xDoc = await XDocument.LoadAsync(fileStream, LoadOptions.None, CancellationToken.None).ConfigureAwait(false);
}
catch (Exception e)
{
//my idea would be, if an exception occured
//- read the file into memory
//- verify if it starts with '#'
//- drop the '#'
}
Is there a better way to handle files with the #
?
Is the #
some XML specification I am unaware of?
CodePudding user response:
I would probably go through a separate phase beforehand, copying files without their first byte, where appropriate. For example:
foreach (var file in Directory.GetFiles("*.xml"))
{
// Note: this assumes all XML files are UTF-8.
using (var input = File.OpenRead(file))
{
int firstByte = input.ReadByte();
if (firstByte != '#')
{
continue;
}
string tmp = file "-tmp";
using (var copy = File.Create(tmp))
{
input.CopyTo(copy);
}
// Close the stream so we can rename.
input.Close();
// Move the original file to a backup, and the temporary to the original name
File.Move(file, Path.ChangeExtension(file, ".bak"));
File.Move(tmp, file);
}
}