Home > OS >  Why do mutexes sometimes fail to fix a multi-threading access problem?
Why do mutexes sometimes fail to fix a multi-threading access problem?

Time:12-31

I'm hoping someone can fill a gap in my knowledge of multithreaded programming here.

I have used mutexes on many occasions to successfully share access to data-structures between threads. For example, I'll often feed a queue on the main thread that is consumed by a worker thread. A mutex wraps access to the queue as tightly as possible when the main thread enqueues something, or when the worker thread dequeues something. It works great!

Okay, now consider a different situation I found myself in recently. Multiple threads are rendering triangles to the same framebuffer/z-buffer pair. (This may be a bad idea on its face, but please overlook that for the moment.) In other words, the work-load of rendering a bunch of triangles is evenly distributed across all these threads that are all writing pixels to the same framebuffer and all checking Z-values against, and updating Z-values to, the same Z-buffer.

Now, I knew this would be problematic from the get-go, but I wanted to see what would happen. Sure enough, when drawing two quads, (one behind the other), some of the pixels from the background quad would occasionally bleed through the foreground quad, unless, of course, I only had one worker thread. So, to fix this problem, I decided to use a mutex. I knew this would be extremely slow, but I wanted to do it anyway just to demonstrate that I had a handle on what the problem really was. The fix was simple: just wrap access to the Z-buffer with a mutex. But to my great surprise, this didn't fix the problem at all! So the question is: why?!

I have a hypothesis, but it is a disturbing one. My guess is that at least one of two things is happening. First, a thread may write to the Z-buffer, but that write operation isn't necessarily flushed from CPU-memory back to Z-buffer memory when another thread goes to read it. Second, a thread may read from the Z-buffer, but do so in prefetched amounts that assume no other thread is writing to it. In either case, even if the mutex is doing its job correctly, there are still going to be cases where we're either reading the wrong Z-value or failing to write a Z-value.

What may support this hypothesis is that I unnecessarily widened the mutex lock time, and while this didn't just make my rendering slower, it also appeared to fix the Z-buffer issue previously described. Why? My guess is because the extra lock time made it more likely that the Z-buffer writes were flushed.

Anyhow, this is disturbing to me, because I don't know why this isn't a problem I've run into before with, for example, simple queues I've been using to communicate between threads for years. Why wasn't the CPU lazy about flushing its cache with my link-list pointers?

So I looked around for maybe ways to add a memory fence or flush a write operation or to make sure that a read operation always pulled from memory (e.g., by using the "volatile" keyword), but none of it was trivial or seemed to help.

What am I not understanding here? Do I really just have no idea what's going on? Thanks for any light you can shed on this.

CodePudding user response:

The fix was simple: just wrap access to the Z-buffer with a mutex.

This is not enough -- the mutex needs to cover both the access to the Z buffer and update of the framebuffer, making the whole operation (check&update Z buffer, conditionally update framebuffer) atomic. Otherwise there is a danger that the Z buffer and framebuffer updates will "cross" and happen in the reverse order. Somthing like:

  • thread 1: check/update z buffer (hit -- pixel is closer than previous)
  • thread 2: check/update z buffer (hit -- pixel is closer than previous)
  • thread 2: update framebuffer
  • thread 1: update framebuffer

and you end up with thread 1's color in the framebuffer even though thread 2 is closer in Z

  • Related