multithreading: which variables need mutex protection when communication via a condition variable?-CodePudding

I have a question on the interplay between condition variables and associated mutex locks (it arose from a simplified example I was presenting in a lecture, confusing myself in the process). Two threads are exchanging data (lets say an int n indicating an array size, and a double *d which points to the array) via shared variables in the memory of their process. I use an additional int flag (initially flag = 0) to indicate when the data (n and d) is ready (flag = 1), a pthread_mutex_t mtx, and a condition variable pthread_cond_t cnd.

This part is from the receiver thread which waits until flag becomes 1 under the protection of the mutex lock, but afterward processes n and d without protection:

while (1) {
    pthread_mutex_lock(&mtx);
    while (!flag) {
      pthread_cond_wait(&cnd, &mtx);
    }
    pthread_mutex_unlock(&mtx);
    // use n and d
}

This part is from the sender thread which sets n and d beforehand without protection by the mutex lock, but sets flag while the mutex is locked:

n = 10;
d = malloc(n * sizeof(float));
pthread_mutex_lock(&mtx);
flag = 1;
pthread_cond_signal(&cnd);
pthread_mutex_unlock(&mtx);

It is clear that you need the mutex in the sender since otherwise you have a "lost wakeup call" problem (see https://stackoverflow.com/a/4544494/3852630).

My question is different: I'm not sure which variables have to be set (in the sender thread) or read out (in the receiver thread) inside the region protected by the mutex lock, and which variables don't need to be protected by the mutex lock. Is it sufficient to protect flag on both sides, or do n and d also need protection?

Memory visiblity (see the rule below) between sender and receiver should be guaranteed by the call to pthread_cond_signal(), so the necessary pairwise memory barriers should be there (in combination with pthread_cond_wait()).

I'm aware that this is an unusual case. Typically my applications modify a task list in the sender and pop tasks from the list in the receiver, and the associated mutex lock protects of the list operations on both sides. However I'm not sure what would be necessary in the case above. Could the danger be that the compiler (which is not aware of the concurrent access to variables) optimizes away the write and/or read access to the variables? Are there other problems if n and d are not protected by the mutex?

Thanks for your help!

David R. Butenhof: Programming with POSIX Threads, p.89: "Whatever memory values a thread can see when is signals ... a condition variable can also be seen by any thread that is awakened by that signal ...".

CodePudding user response：

Is it sufficient to protect flag on both sides, or do n and d also need protection?

In principle, if you use your mutex, CV, and flag in such a way that the the writer does not modify n, d, or *d after setting flag under mutex protection, and the reader cannot access n, d, or *d until after it observes the modification of flag (under protection of the same mutex), then you can rely on the reader observing the writer's last-written values of n, d, and *d. This is more or less a hand-rolled semaphore.

In practice, you should use whichever system synchronization objects you have chosen (mutexes, semaphores, etc.) to protect all the shared data. Doing so is easier to reason about and less prone to spawn bugs. Often, it's simpler, too.

CodePudding user response：

At the low level of memory ordering, the rules aren't specified in terms of "the region protected by a mutex", but in terms of synchronization. Whenever two different threads access the same non-atomic object, then unless they are both reads, there must be a synchronization operation in between, to ensure that one of the accesses definitely happens before the other.

The way to achieve synchronization is to have one thread perform a release operation (such as unlocking a mutex) after accessing the shared variables, and have the other thread perform an acquire operation (such as locking a mutex) before accessing them, in such a way that program logic guarantees the acquire must have happened after the release.

And here, you have that. The sender thread does perform a mutex unlock after accessing n and d (last line of the sender code). And the receiver does perform a mutex lock before accessing them (inside pthread_cond_wait). The setting and testing of flag ensures that, when we exit the while (!flag) loop, the most recent lock of the mutex by the receiver did happen after the unlock by the sender. So synchronization is achieved.

The compiler and CPU must not perform any optimization that would defeat this, so in particular they can't optimize away the accesses to n and d, or reorder them around the synchronizing operations. This is usually ensured by treating the release/acquire operations as barriers. Any accesses to potentially shared objects that occur in program order before a release barrier must actually be performed and flushed to coherent memory/cache prior to anything that comes after the barrier (in this case, before any other thread may see the mutex as unlocked). If special CPU instructions are needed to ensure global visibility, the compiler must emit them. And vice versa for acquire barriers: any access that occurs in program order after an acquire barrier must not be reordered before it.

To say it another way, the compiler treats the release barrier as an operation that may potentially read all of memory; so all variables must be written out before that point, so that the actual contents of memory at that point will match what an abstract machine would have. Likewise, an acquire barrier is treated as an operation that may potentially write all of memory, and all variables must be reloaded from memory afterwards. The only exception would be local variables for which the compiler can prove that no other thread could legally know their address; those can be safely kept in registers or otherwise reordered.

It's true that, after the synchronizing lock operation, the receiver happened to unlock the mutex again, but that isn't relevant here. That unlock doesn't synchronize with anything in this particular program, and it has no impact on its execution. Likewise, for purposes of synchronizing access to n and d, it didn't matter whether the sender locked the mutex before or after accessing them. (Though it was important that the sender locked the mutex before writing to flag; that's how we ensure that any earlier reads of flag by the receiver really did happen before the write, instead of racing with it.)

The principle that "accesses to shared variables should be inside a critical region protected by a mutex" is just a higher-level abstraction that is one way to ensure that accesses by different threads always have a synchronizing unlock-lock pair in between them. And in cases where the variables could be accessed over and over again, you normally would want a lock before, and an unlock after, every such access, which is equivalent to the "critical section" principle. But this principle is not itself the fundamental rule.

That said, in real life you probably do want to follow this principle as much as possible, since it will make it easier to write correct code and avoid subtle bugs, and likewise make it easier for other programmers to verify that your code is correct. So even though it is not strictly necessary in this program for the accesses to n,d to be "protected" by the mutex, it would probably be wise to do so anyway, unless there is a significant and measurable benefit (performance or otherwise) to be gained by avoiding it.

The condition variable doesn't play a role in the race avoidance here, except insofar as the pthread_cond_wait locked the mutex. Functionally, it is equivalent to having the receiver simply do a tight spin loop of "lock mutex; test flag; unlock mutex", but without wasting all those CPU cycles.

And I think that the quote from Butenhof is mistaken, or at best misleading. My understanding is that pthread_cond_signal by itself is not guaranteed to be a barrier of any kind, and in fact has no memory ordering effect whatsoever. POSIX doesn't directly address memory ordering, but this is the case for the standard C equivalent cnd_signal. There would be no point in having pthread_cond_signal ensure global visibility unless you could make use of it by assuming that all those accesses were visible by the time the corresponding pthread_cond_wait returns. But since pthread_cond_wait can wake up spuriously, you cannot ever safely make such an assumption.

In this program, the necessary release barrier in the sender comes not from pthread_cond_signal, but rather from the subsequent pthread_mutex_unlock.