Home > database >  `System.Xml.Linq` consumes a massive amount of memory
`System.Xml.Linq` consumes a massive amount of memory

Time:12-27

It appears that System.Xml.Linq is consuming an enormous amount of memory, even after any resources should be freed. A simple demonstration

await using ( System.IO.FileStream stream = new ( xmlFilePath, System.IO.FileMode.Open) ) {
    using ( System.Xml.XmlReader reader = System.Xml.XmlReader.Create( stream, new () { ConformanceLevel = System.Xml.ConformanceLevel.Fragment, Async = true } ) ) {
        int i = 0;
        while ( await reader.ReadAsync().ConfigureAwait( false ) ) {
            while ( reader.NodeType != System.Xml.XmlNodeType.None ) {
                if ( reader.NodeType == System.Xml.XmlNodeType.XmlDeclaration ) {
                    await reader.SkipAsync().ConfigureAwait( false );
                    continue;
                }
                if ( ct.IsCancellationRequested ) {
                    continue;
                }
                i  ;
                if ( i % 100000 == 0 ) {
                    Console.WriteLine( $"Processed {i}: {reader.ReadString()}" );
                }
                System.Xml.Linq.XNode node = await System.Xml.Linq.XNode.ReadFromAsync( reader, ct ).ConfigureAwait( false );

            }
        }
    }
}
Console.WriteLine( $"\n---->Memory Use/false: {GC.GetTotalMemory(false):N0}");
Console.WriteLine( $"---->Memory Use      : {GC.GetTotalMemory(true):N0}\n");
return;

Outputs:

---->Memory Use/false: 402,639,448
---->Memory Use      : 400,967,152

If I replace the XNode portion,

                        string xmlFilePath = "/home/eric/dev/src/github.com/erichiller/mkmrk-dotnet/src/Cli/dataset/cme/definition/2021/11/2021-11-05/20211104.061134-05_20211104.030927-05_cmeg.nymex.fut.prf.xml";
                        
await using ( System.IO.FileStream stream = new ( xmlFilePath, System.IO.FileMode.Open) ) {
    using ( System.Xml.XmlReader reader = System.Xml.XmlReader.Create( stream, new () { ConformanceLevel = System.Xml.ConformanceLevel.Fragment, Async = true } ) ) {
        int i = 0;
        while ( await reader.ReadAsync().ConfigureAwait( false ) ) {
            while ( reader.NodeType != System.Xml.XmlNodeType.None ) {
                if ( reader.NodeType == System.Xml.XmlNodeType.XmlDeclaration ) {
                    await reader.SkipAsync().ConfigureAwait( false );
                    continue;
                }
                if ( ct.IsCancellationRequested ) {
                    continue;
                }
                i  ;
                if ( i % 100000 == 0 ) {
                    Console.WriteLine( $"Processed {i}: {reader.ReadString()}" );
                }
                await reader.ReadAsync().ConfigureAwait( false );
            }
        }
    }
}
Console.WriteLine( $"\n---->Memory Use/false: {GC.GetTotalMemory(false):N0}");
Console.WriteLine( $"---->Memory Use      : {GC.GetTotalMemory(true):N0}\n");
return;

The use drops considerably:

---->Memory Use/false: 11,048,992
---->Memory Use      : 6,317,248

What am I misunderstanding here / doing wrong? The file being loaded is large (~60MB) but even if XNode needed to use that much memory, shouldn't it be released by the time Console.WriteLine is reached?

CodePudding user response:

Linq to XML eagerly loads the whole XML document into memory, creating many objects to represent it. You appear to be doing that many times in a loop while reading - without much protecting the recursive traversal.

However, using an XmlReader, that allows for manual control and it only reads enough information and allows the consumer to decide what to do with it.

Memory is not always freed when blocks are closed. Things on the heap will be cleaned up by GC at some point.

  • Related