Stream.Read() is Slow Performance-CodePudding

Converting Stream to Array by using memorystream.Read(arr, 0, length) for 19 Mb file. When Running it in machine1 it takes approx 1.26 Sec, in Machine2 it takes approx 3 sec. Why there is a difference in performance?! Is that related to ram Usage of the machine, CPU?! Do we need to increase RAM?!

using (var pdfContent = new MemoryStream(System.IO.File.ReadAllBytes(path))) 
{
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();
    byte[] buffer = new byte[pdfContent.Length];
    pdfContent.Read(buffer, 0, (int)pdfContent.Length);
    stopwatch.Stop();
    Console.WriteLine($"End Time:{stopwatch.Elapsed} ");
}

CodePudding user response：

TL;DR: 1. A result of file operations is highly depends on your machine configuration (type and even a model of hard disk is the most crucial in such kind of tests). 2. You should read file by chunks.

Let's look a bit closer to that example. I prepared a test text file of 21042116 bytes that is 21Mb, create a new console application and added the benchmark library: BenchmarkDotNet:

using System;
using System.Diagnostics;
using System.IO;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;

namespace stream_perf
{
    [SimpleJob(RuntimeMoniker.NetCoreApp50)]
    [RPlotExporter]
    public class StreamBenchmarks
    {
        [Benchmark]
        public void Stackoverflow()
        {
            using (var pdfContent = new MemoryStream(System.IO.File.ReadAllBytes("payload.txt"))) 
            {
                byte[] buffer = new byte[pdfContent.Length];
                pdfContent.Read(buffer, 0, (int)pdfContent.Length);
            }
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var summary = BenchmarkRunner.Run<StreamBenchmarks>();
        }
    }
}

Using a console a ran two commands:

dotnet build -c release
dotnet run -c release

That gave me the following result:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i5-8300H CPU 2.30GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.103
  [Host]        : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT

Job=.NET Core 5.0  Runtime=.NET Core 5.0

|        Method |     Mean |    Error |   StdDev |
|-------------- |---------:|---------:|---------:|
| Stackoverflow | 24.24 ms | 0.378 ms | 0.353 ms |

As you can see on my machine it is a really fast. But is it fast enough? No, it doesn't, because we read that file data twice, first time we read file here: System.IO.File.ReadAllBytes("payload.txt") and second time we read file here: pdfContent.Read(buffer, 0, (int)pdfContent.Length);. So I added the following method to my benchmarks:

[Benchmark]
public void ReadChunked()
{
    int totalBytes = 0;
    int readBytes = 0;
    using (var pdfStream = new System.IO.FileStream("payload.txt", FileMode.Open))
    {
        byte[] buffer = new byte[4096];
        while ((readBytes = pdfStream.Read(buffer)) != 0) {
            // do something with buffer
            totalBytes  = readBytes;
        }
    }
}

In that new method we read file by chunks that gives us some advantages:

We read file once
We do not need to allocate a buffer in RAM equals to file size

Let's look to the benchmark:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i5-8300H CPU 2.30GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.103
  [Host]        : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
  .NET Core 5.0 : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT

Job=.NET Core 5.0  Runtime=.NET Core 5.0

|        Method |     Mean |    Error |   StdDev |
|-------------- |---------:|---------:|---------:|
| Stackoverflow | 23.85 ms | 0.149 ms | 0.132 ms |
|   ReadChunked | 18.68 ms | 0.076 ms | 0.071 ms |

New method is faster on 21%