It appears that System.Xml.Linq
is consuming an enormous amount of memory, even after any resources should be freed.
A simple demonstration
await using ( System.IO.FileStream stream = new ( xmlFilePath, System.IO.FileMode.Open) ) {
using ( System.Xml.XmlReader reader = System.Xml.XmlReader.Create( stream, new () { ConformanceLevel = System.Xml.ConformanceLevel.Fragment, Async = true } ) ) {
int i = 0;
while ( await reader.ReadAsync().ConfigureAwait( false ) ) {
while ( reader.NodeType != System.Xml.XmlNodeType.None ) {
if ( reader.NodeType == System.Xml.XmlNodeType.XmlDeclaration ) {
await reader.SkipAsync().ConfigureAwait( false );
continue;
}
if ( ct.IsCancellationRequested ) {
continue;
}
i ;
if ( i % 100000 == 0 ) {
Console.WriteLine( $"Processed {i}: {reader.ReadString()}" );
}
System.Xml.Linq.XNode node = await System.Xml.Linq.XNode.ReadFromAsync( reader, ct ).ConfigureAwait( false );
}
}
}
}
Console.WriteLine( $"\n---->Memory Use/false: {GC.GetTotalMemory(false):N0}");
Console.WriteLine( $"---->Memory Use : {GC.GetTotalMemory(true):N0}\n");
return;
Outputs:
---->Memory Use/false: 402,639,448
---->Memory Use : 400,967,152
If I replace the XNode
portion,
string xmlFilePath = "/home/eric/dev/src/github.com/erichiller/mkmrk-dotnet/src/Cli/dataset/cme/definition/2021/11/2021-11-05/20211104.061134-05_20211104.030927-05_cmeg.nymex.fut.prf.xml";
await using ( System.IO.FileStream stream = new ( xmlFilePath, System.IO.FileMode.Open) ) {
using ( System.Xml.XmlReader reader = System.Xml.XmlReader.Create( stream, new () { ConformanceLevel = System.Xml.ConformanceLevel.Fragment, Async = true } ) ) {
int i = 0;
while ( await reader.ReadAsync().ConfigureAwait( false ) ) {
while ( reader.NodeType != System.Xml.XmlNodeType.None ) {
if ( reader.NodeType == System.Xml.XmlNodeType.XmlDeclaration ) {
await reader.SkipAsync().ConfigureAwait( false );
continue;
}
if ( ct.IsCancellationRequested ) {
continue;
}
i ;
if ( i % 100000 == 0 ) {
Console.WriteLine( $"Processed {i}: {reader.ReadString()}" );
}
await reader.ReadAsync().ConfigureAwait( false );
}
}
}
}
Console.WriteLine( $"\n---->Memory Use/false: {GC.GetTotalMemory(false):N0}");
Console.WriteLine( $"---->Memory Use : {GC.GetTotalMemory(true):N0}\n");
return;
The use drops considerably:
---->Memory Use/false: 11,048,992
---->Memory Use : 6,317,248
What am I misunderstanding here / doing wrong? The file being loaded is large (~60MB) but even if XNode needed to use that much memory, shouldn't it be released by the time Console.WriteLine
is reached?
CodePudding user response:
Linq to XML eagerly loads the whole XML document into memory, creating many objects to represent it. You appear to be doing that many times in a loop while reading - without much protecting the recursive traversal.
However, using an XmlReader
, that allows for manual control and it only reads enough information and allows the consumer to decide what to do with it.
Memory is not always freed when blocks are closed. Things on the heap will be cleaned up by GC at some point.