I have a multithreaded application that uses barriers to synchronise worker threads.
At the end of function compute(), threads are cancelled:
...
for(int i=0;i<p; i ){
printf("Thread %lu completed in %d passes\n",threads[i],find_tstat(threads[i])->count);
pthread_cancel(threads[i]);
}
printf("================================================================\n");
return a;
Threads are interrupted in the middle of computation, so they may be in between barriers. This is likely what's causing pthread_barrier_destroy() to hang, is because some barrier_wait() has not returned yet.
The question is; how can I still destroy even if a wait() hasn't returned?
CodePudding user response:
Answer to your question is: you can't.
The results are undefined if pthread_barrier_destroy() is called when any thread is blocked on the barrier
On Linux, cancellation is implemented using signals.
If a signal is delivered to a thread blocked on a barrier, upon return from the signal handler the thread shall resume waiting at the barrier if the barrier wait has not completed (that is, if the required number of threads have not arrived at the barrier during the execution of the signal handler); otherwise, the thread shall continue as normal from the completed barrier wait. Until the thread in the signal handler returns from it, it is unspecified whether other threads may proceed past the barrier once they have all reached it.
A thread that has blocked on a barrier shall not prevent any unblocked thread that is eligible to use the same processing resources from eventually making forward progress in its execution. Eligibility for processing resources shall be determined by the scheduling policy.
CodePudding user response:
As the question is posed:
The question is; how can I still destroy even if a wait() hasn't returned?
the answer is "you can't", as your other answer explains.
However, with good enough record keeping, you can launch just enough extra threads specifically to wait at the barrier in order to let any other threads already waiting pass through. This would likely be tied together with code and data intended to provide for your threads to be shut down cleanly instead of being canceled, which is also something you should do.
On the other hand, it's pretty easy to roll your own barrier with use of a condition variable and mutex, and the result is more flexible. You still should not be canceling threads, but you can make waits at a hand-rolled barrier such as I describe soft-cancelable. This would be my recommendation.