Converting Stream to Array by using memorystream.Read(arr, 0, length)
for 19 Mb file.
When Running it in machine1 it takes approx 1.26 Sec, in Machine2 it takes approx 3 sec.
Why there is a difference in performance?! Is that related to ram Usage of the machine, CPU?! Do we need to increase RAM?!
using (var pdfContent = new MemoryStream(System.IO.File.ReadAllBytes(path)))
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
byte[] buffer = new byte[pdfContent.Length];
pdfContent.Read(buffer, 0, (int)pdfContent.Length);
stopwatch.Stop();
Console.WriteLine($"End Time:{stopwatch.Elapsed} ");
}
CodePudding user response:
TL;DR: 1. A result of file operations is highly depends on your machine configuration (type and even a model of hard disk is the most crucial in such kind of tests). 2. You should read file by chunks.
Let's look a bit closer to that example. I prepared a test text file of 21042116 bytes that is 21Mb, create a new console application and added the benchmark library: BenchmarkDotNet:
using System;
using System.Diagnostics;
using System.IO;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
namespace stream_perf
{
[SimpleJob(RuntimeMoniker.NetCoreApp50)]
[RPlotExporter]
public class StreamBenchmarks
{
[Benchmark]
public void Stackoverflow()
{
using (var pdfContent = new MemoryStream(System.IO.File.ReadAllBytes("payload.txt")))
{
byte[] buffer = new byte[pdfContent.Length];
pdfContent.Read(buffer, 0, (int)pdfContent.Length);
}
}
}
class Program
{
static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<StreamBenchmarks>();
}
}
}
Using a console a ran two commands:
dotnet build -c release
dotnet run -c release
That gave me the following result:
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i5-8300H CPU 2.30GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.103
[Host] : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
.NET Core 5.0 : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
Job=.NET Core 5.0 Runtime=.NET Core 5.0
| Method | Mean | Error | StdDev |
|-------------- |---------:|---------:|---------:|
| Stackoverflow | 24.24 ms | 0.378 ms | 0.353 ms |
As you can see on my machine it is a really fast.
But is it fast enough? No, it doesn't, because we read that file data twice, first time we read file here: System.IO.File.ReadAllBytes("payload.txt")
and second time we read file here: pdfContent.Read(buffer, 0, (int)pdfContent.Length);
. So I added the following method to my benchmarks:
[Benchmark]
public void ReadChunked()
{
int totalBytes = 0;
int readBytes = 0;
using (var pdfStream = new System.IO.FileStream("payload.txt", FileMode.Open))
{
byte[] buffer = new byte[4096];
while ((readBytes = pdfStream.Read(buffer)) != 0) {
// do something with buffer
totalBytes = readBytes;
}
}
}
In that new method we read file by chunks that gives us some advantages:
- We read file once
- We do not need to allocate a buffer in RAM equals to file size
Let's look to the benchmark:
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i5-8300H CPU 2.30GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.103
[Host] : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
.NET Core 5.0 : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
Job=.NET Core 5.0 Runtime=.NET Core 5.0
| Method | Mean | Error | StdDev |
|-------------- |---------:|---------:|---------:|
| Stackoverflow | 23.85 ms | 0.149 ms | 0.132 ms |
| ReadChunked | 18.68 ms | 0.076 ms | 0.071 ms |
New method is faster on 21%