Home > Blockchain >  Memory leak in using PLinq AsParallel
Memory leak in using PLinq AsParallel

Time:04-13

I wrote a piece of testing code using AsParallel to concurrently read big files. It causes memory leak. It seems GC doesn’t recycle the unused objects as expected. Please see the code snippet.

        static void Main(string[] args)
        {
            int[] array = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

            //foreach (var element in array) // No leaks
            //{
            //    ReadBigFile();
            //}

            array.AsParallel().ForAll(l => ReadBigFile()); // Memory leak

            Console.ReadLine();
        }

        private static void ReadBigFile()
        {
            List<byte[]> lst = new List<byte[]>();
            var file = "<BigFilePath>"; // 600 Mb
            lst.Add(File.ReadAllBytes(file));
        }

I tried this with both synchronized and parallel way. The synchronized foreach runs OK as no memory leaks. But when I use AsParallel to read the file concurrently, memory leak happens as it took 6 GB size of memory and never go back down.

Please help to identify what the root cause is? And what the write things to do if I want to concurrently complete the same task? Thank you.

PS: The issue happens in both dotnet framework (4.6.1) and dotnet core (6.0).

CodePudding user response:

The objects allocated to hold your 600MB file are considered "large objects" and as such are allocated on the Large Object Heap.

To clean up these objects, a Generation 2 collection needs to occur. This doesn't happen as often as Generation 0 collections for short-lived objects. A refresher course on how the Garbage Collector works is a good idea to understand this.

The reason GC.Collect() "frees" this memory is because calling it with no arguments performs a full collection of every generation.. including the Large Object Heap in Gen 2.

To address your concerns about memory in production - you should consider streaming these files in if possible. If not, you will need to carefully batch your files because crunching through hundreds of half-gig files in parallel is likely to cripple you in both CPU and IO depending on the environment. The runtime can only ask for so much memory from the OS.

CodePudding user response:

@DiplomacyNotWar I like your previous comment

If you do it sequentially then you'll have no references to the object from the previous loop, which makes it available for moving to the next generation and ultimately garbage collection. Blockquote

Then I modified the code as

        int[] array = new[] { 1, 2 };
        for (int i = 0; i < 5; i  )
        {
            array.AsParallel().ForAll(l => ReadBigFile()); 
        }

No I can see the memory allocation is only 1.1GB, which should be the memory sized needed for last round of the loop. So I think now I'm convinced, it's just a timing issue of GC and not real memory leaks. Thank you @DiplomacyNotWar very very much!

  • Related