CopyToAsync weird behaviour when used from multiple threads-CodePudding

I have the following function to write to a file asynchronously from multiple threads in parallel->

static startOffset = 0; // This variable will store the offset at which the thread begins to write
static int blockSize = 10; // size of block written by each thread
static Task<long> WriteToFile(Stream dataToWrite)
{
   var startOffset= getStartfOffset(); // Definition of this function is given later
   using(var fs = new FileStream(fileName,
                FileMode.OpenOrCreate,
                FileAccess.ReadWrite,
                FileShare.ReadWrite))
  {
     fs.Seek(offset,SeekOrigin.Begin); 
     await dataToWrite.CopyToAsync(fs); 
  }
  return startOffset;
} 

/**
*I use reader writer lock here so that only one thread can access the value of the startOffset at 
*a time
*/
static int getStartOffset()
{
  int result = 0;
  try
 {
   rwl.AcquireWriterLock();
   result = startOffset; 
   startOffset =blockSize; // increment the startOffset for the next thread 
 }
 finally
 {
  rwl.ReleaseWriterLock(); 
 } 
 return result; 
}

I then access the above function using to write some strings from multiple threads.

var tasks = List<Task>(); 
for(int i=1;i<=4;i  )
{
   tasks.Add(Task.Run( async() => {
      String s = "aaaaaaaaaa" 
      byte[] buffer = new byte [10]; 
      buffer = Encoding.Default.GetBytes(s); 
      Stream data = new MemoryStream(buffer); 
      long offset = await WriteToFile(data);  
      Console.WriteLine($"Data written at offset - {offset}"); 
   }); 
}

Task.WaitAll(tasks.ToArray());

Now , this code executes well most of the times. But sometimes randomly, it write some Japanese characters or some other symbols in the file. Is there something that I am doing wrong in the multithreading?

CodePudding user response：

Your calculation of startOffset assumes that each thread is writing exactly 10 bytes. There are several issues with this.

One, the data has unknown length:

  byte[] buffer = new byte [10]; 
  buffer = Encoding.Default.GetBytes(s);

The assignment doesn't put data into the newly allocated 10 byte array, it leaks the new byte[10] array (which will be garbage collected) and stores a reference to the return of GetBytes(s), which could have any length at all. It could overflow into the next Task's area. Or it could leave some content that existed in the file beforehand (you use OpenOrCreate) which lies in the area for the current Task, but past the end of the actual dataToWrite.

Two, you try to seek past the areas that other threads are expected to write to, but if those writes haven't completed, they haven't increased the file length. So you attempt to seek past the end of the file, which is allowed for the Windows API but might cause problems with the .NET wrappers. However, FileStream.Seek does indicate you are ok

When you seek beyond the length of the file, the file size grows

although this might not be precisely correct, since the Windows API says

It is not an error to set a file pointer to a position beyond the end of the file. The size of the file does not increase until you call the SetEndOfFile, WriteFile, or WriteFileEx function. A write operation increases the size of the file to the file pointer position plus the size of the buffer written, which results in the intervening bytes uninitialized.

CodePudding user response：

I think that asynchronous file I/O is not usually meant to be utilized with multithreading. Just because something is asynchronous does not mean that an operation should have multiple threads assigned to it.

To quote the documentation for async file I/O: Asynchronous operations enable you to perform resource-intensive I/O operations without blocking the main thread. Basically, instead of using a bunch of threads on one operation, it dispatches a new thread to accomplish a less meaningful task. Eventually with a big enough application, nearly everything can be abstracted to be a not-so-meaningful task and computers can run massive apps pretty quickly utilizing multithreading.

What you are likely experiencing is undefined behavior due to multiple threads overwriting the same location in memory. These Japanese characters you are referring to are likely malformed ascii/unicode that your text editor is attempting to interpret.

If you would like to remedy the undefined behavior and remain using asynchronous operations, you should be able to await each individual task before the next one can start. This will prevent the offset variable from being in the incorrect position for the newest task. Although, logically it will run the same as a synchronous version.