Home > Software design >  why are the StreamReader functions, ReadLineAsync and ReadToEndAsync so slow on a file?
why are the StreamReader functions, ReadLineAsync and ReadToEndAsync so slow on a file?

Time:01-28

Does anyone know why ReadLineAsync and ReadToEndAsync are so much slower, than their synchronous counterparts ReadLine and ReadToEnd? I could understand the slowness if i was awaiting multiple calls, but that not the case.

I'm using a Release build, and i'm not starting it with debugging.

I tested it on a 420MB CSV file, containing only the following line repeated:

1234567890123456,12345678,12345678901,1,1,1,1234567890123456789
1234567890123456,12345678,12345678901,1,1,1,1234567890123456789
1234567890123456,12345678,12345678901,1,1,1,1234567890123456789
[etc...]

I tested it with the following program (results are in comments):

static void Main(string[] args)
{
    var sw = new Stopwatch();

    sw.Restart();
    One_ReadToEnd();
    Console.WriteLine($"One_ReadToEnd: {sw.Elapsed}"); // One_ReadToEnd: 00:00:06.1749275

    sw.Restart();
    One_ReadToEndAsync().GetAwaiter().GetResult();
    Console.WriteLine($"One_ReadToEndAsync: {sw.Elapsed}"); // One_ReadToEndAsync: 00:00:23.3265661

    sw.Restart();
    Many_ReadLine();
    Console.WriteLine($"Many_ReadLine: {sw.Elapsed}");  // Many_ReadLine: 00:00:05.9391718

    sw.Restart();
    Many_ReadLineAsync().GetAwaiter().GetResult();
    Console.WriteLine($"Many_ReadLineAsync: {sw.Elapsed}"); // Many_ReadLineAsync: 00:00:31.4988402
}

const string path = @"C:\Temp\test.csv";

static void One_ReadToEnd()
{

    using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read), Encoding.ASCII))
    {
        sr.ReadToEnd();
        sr.Close();
    }
}

static async Task One_ReadToEndAsync()
{
    using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous), Encoding.ASCII))
    {
        await sr.ReadToEndAsync();
        sr.Close();
    }
}

static void Many_ReadLine()
{

    using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read), Encoding.ASCII))
    {
        while (!sr.EndOfStream)
            sr.ReadLine();

        sr.Close();
    }
}

static async Task Many_ReadLineAsync()
{
    using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous), Encoding.ASCII))
    {
        while (!sr.EndOfStream)
            await sr.ReadLineAsync();

        sr.Close();
    }
}

These were the results:

One_ReadToEnd:      00:00:06.1749275
One_ReadToEndAsync: 00:00:23.3265661
Many_ReadLine:      00:00:05.9391718
Many_ReadLineAsync: 00:00:31.4988402

CodePudding user response:

Looking at your code, its not apples-to-apples

sync: using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read), Encoding.ASCII))

and async: using (var sr = new StreamReader(new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous), Encoding.ASCII))

see the difference (no, not the 4096 - that's the default buffer size), but the FileOptions.Asynchronous. This does not make the streamreader asynchronous, but makes opening the file in async mode - ie the file can be read or written asychronously ("overlapped" in windows-speak)

Normally this shouldn't make a difference, but who knows what layers of code is in there, so try without the filestream options and see if that changes things.

the docs say:

    using (StreamReader reader = File.OpenText("existingfile.txt"))
    {
        Console.WriteLine("Opened file.");
        result = await reader.ReadToEndAsync();
        Console.WriteLine("Contains: "   result);
    }

in the ReadToEndAsync documentation, so its not necessary to open the file overlapped to get your app running asynchronously. Or you can try opening the file in overlapped mode in your synchronous version.


Edit: Had a quick look at the source, it all seems the same logic between sync and async CS code even if it accesses the streams via properties and is littered with GetAwait type calls, but the async section has this comment. I can't find the referenced bug but maybe this is causing the massive slowdown in accessing the file.

    // Access to instance fields of MarshalByRefObject-derived types requires special JIT helpers that check
    // if the instance operated on is remote. This is optimised for fields on this but if a method is Async
    // and is thus lifted to a state machine type, access will be slow.
    // As a workaround, we either cache instance fields in locals or use properties to access such fields.

    // See Dev11 bug #370300 for more info.

CodePudding user response:

This seems to be StreamReader.ReadLineAsync performance can be improved.

In the issue's thread there are benchmarks done with BenchmarkDotNet with a comment from a Microsoft dev below the results:

My observations:

  • .NET 6 is 12-13% faster on average. If IO was the limiting factor, I would expect the difference to be larger for async method.
  • Sync implementation is as twice as fast as async. This is expected, as async File IO has some non trivial overhead compared to sync. The main benefit of async File IO is improved scalability, not performance.

I had a quick look at the implementation and there is definitely place for improvement. I am going to change the issue title (there is no such thing as Stream.ReadLineAsync) and it's 2 (not 6) times slower and make it up-for-grabs.

The work to improve this was included in System.IO work planned for .NET 7 but as of Jan 2023 with .NET7.0.100 released the issue is still open and the async versions seem to be 2x slower than the non-async ones.

  • Related