I am encountering an issue where I have a hard time telling which synchronization primitive I should use.
I am creating n parallel threads that work on a region of memory, each is assigned to a specific part of this region and can accomplish its task independently from the other ones. At some point tho I need to collect the result of the work of all the threads, which is a good case for using barriers, this is what I'm doing.
I must use one of the n worker threads to collect the result of all their work, for this I have the following code that follows the computation code in my thread function:
if (pthread_barrier_wait(thread_args->barrier)) {
// Only gets called on the last thread that goes through the barrier
// This is where I want to collect the results of the worker threads
}
So far so good, but now is where I get stuck: the code above is in a loop as I want the threads to accomplish work again for a certain number of loop spins. The idea is that each time pthread_barrier_wait
unblocks it means all threads have finished their work and the next iteration of the loop / parallel work can start again.
The problem with this is that the result collector block statements are not guaranteed to execute before other threads start working on this region again, so there is a race condition. I am thinking of using a UNIX condition variable like this:
// This code is placed in the thread entry point function, inside
// a loop that also contains the code doing the parallel
// processing code.
if (pthread_barrier_wait(thread_args->barrier)) {
// We lock the mutex
pthread_mutex_lock(thread_args->mutex);
collectAllWork(); // We process the work from all threads
// Set ready to 1
thread_args->ready = 1;
// We broadcast the condition variable and check it was successful
if (pthread_cond_broadcast(thread_args->cond)) {
printf("Error while broadcasting\n");
exit(1);
}
// We unlock the mutex
pthread_mutex_unlock(thread_args->mutex);
} else {
// Wait until the other thread has finished its work so
// we can start working again
pthread_mutex_lock(thread_args->mutex);
while (thread_args->ready == 0) {
pthread_cond_wait(thread_args->cond, thread_args->mutex);
}
pthread_mutex_unlock(thread_args->mutex);
}
There is multiple issues with this:
- For some reason
pthread_cond_broadcast
never unlocks any other thread waiting onpthread_cond_wait
, I have no idea why. - What happens if a thread
pthread_cond_wait
s after the collector thread has broadcasted? I believewhile (thread_args->ready == 0)
andthread_args->ready = 1
prevents this, but then see next point... - On the next loop spin,
ready
will still be set to1
hence no thread will callpthread_cond_wait
again. I don't see any place where to properly setready
back to0
: if I do it in the else block afterpthread_cond_wait
, there is the possibility that another thread that wasn't cond waiting yet reads1
and starts waiting even if I already broadcasted from theif
block.
Note I am required to use barriers for this.
How can I solve this issue?
CodePudding user response:
You could use two barriers (work and collector):
while (true) {
//do work
//every thread waits until the last thread has finished its work
if (pthread_barrier_wait(thread_args->work_barrier)) {
//only one gets through, then does the collecting
collectAllWork();
}
//every thread will wait until the collector has reached this point
pthread_barrier_wait(thread_args->collect_barrier);
}
CodePudding user response:
You could use a kind of double buffering.
Each worker would have two storage slots for results. Between the barriers the workers would store their results to one slot while the collector would read results from the other slot.
This approach has a few advantages:
- no extra barriers
- no condition queues
- no locking
- slot identifier does not even have to be atomic because each thread could have it's own copy of it and toggle it whenever reaching a barrier
- much more performant as workers can work when collector is processing the other slot
Exemplary workflow:
Iteration 1.
- workers write to slot 0
- collector does nothing because no data is ready
- all wait for barrier
Iteration 2.
- worker write to slot 1
- collector reads from slot 0
- all wait for barrier
Iteration 3.
- workers write to slot 0
- collector reads from slot 1
- all wait for barrier
Iteration 4.
- go to iteration 2