Home > front end >  Deadlock when de-initializing instances with long-running threads
Deadlock when de-initializing instances with long-running threads

Time:09-22

I often encounter such a problem at work - I need to create a class which includes long-running threads that work with shared member variables and also there is some kind of stop() method, that stops all the threads and de-initializes the instance.

The problem here is the corner case - when the instance has to be de-initialized:

  • The long-running threads are working with shared variables, hence there should be a class-wise mutex, which the threads and other methods should take.
  • When an instance is de-initializing, the long-running threads have to be commanded to halt, so there should be some kind of semaphore for that.
  • But the de-initialization method should switch the stop-semaphore outside of mutex's guard, otherwise there would be a deadlock when joining the threads.
  • However if the de-initialization method first switches the stop-semaphore and then takes the mutex, there is also a possibility of a deadlock, because the long-running threads could check the semaphore before it's switched, but be surpassed in taking the mutex by the de-init method.

What is the best way to solve this de-init deadlock problem? I am especially keen to find an authoritative source.

CodePudding user response:

It is possible to use a barrier to set up a rendez-vous that all the threads reach before ending.
At the beginning of the mutual exclusion region a stop flag is checked/set. If the flag is set, the running thread releases the mutex to go out of the mutual exclusion region and calls pthread_barrier_wait(). At one moment all the threads will have reached the barrier (i.e. barrier counter drops to 0) and the last one will get the PTHREAD_BARRIER_SERIAL_THREAD return code after which it will do the cleanup of the data structures.
The latter supposes that the number of running threads is known at the time the barrier is initialized (count parameter passed to pthread_barrier_init()) and that the running threads regularly enter the mutual exclusion region to check the stop flag.

CodePudding user response:

I'm not sure what the question is - a coding pattern for shutting down threads or avoiding deadlock while doing so. I can only appeal to authority on the latter.

Coffman, Elphick, and Shoshani in "System Deadlocks," Computing Surveys, Vol. 3, No 2, June 1971 PP 71-76 stated the following four necessary conditions that must be in effect for a deadlock to exist.

  1. Mutual exclusion
  2. Wait for
  3. No preemption
  4. Circular wait

Remove any one of those conditions and you can't deadlock. If you're looking for an authoritative answer on how to handle your circumstance, there isn't enough detail in your question to make a specific recommendation. Maybe you don't care about reasons for deadlock, but I'll use these conditions to give context to a few solutions. I'll also tell you what I do for the simple case of a class that has long running threads.

  1. Removing mutual-exclusion - if state is only being read and not written, a read/write lock can be used and when acquired for read, there is no mutual exclusion with other readers.
  2. Removing wait-for - if the condition being check has not been met, release and reacquire the mutex allowing other threads to acquire and modify state until the condition you're waiting for has been met. This is what a condition variable does for you (e.g., pthread_condition). It allows you to wait for some condition to be true (e.g., number of running threads is 0) while not holding the mutex that guards the state you're waiting to change.
  3. Allowing preemption - I've never seen an O/S mechanism to directly support this. You need locks that can be canceled - database do this.
  4. Removing circular-wait - this is usually how deadlock is avoided. The classic method is by controlling the order locks are grabbed. When grabbing more than one lock always grab them in the same order. However, best is to not hold more than one lock at a time using finer grained locks. The answer from Rachid K. does this. The class mutex protects the stop flag and a new lock in the form of a barrier protects its own state.

Another choice is to not use the class mutex at all for the stop flag and use Burak Serdar's suggestion of an atomic. There the cpu is doing the locking to ensure consistent access. Atomics also can't be part of a circular-wait because the locking/unlocking is all hidden from your code.

Or you can keep the single class lock and implement your own barrier to remove the wait-for condition. The class mutex can protect both the stop flag and an active thread count. The condition variable allows you to wait while not holding the class mutex yet its still used to protect the class state when written and read. If you're using a pthread_condition, when you call pthread_cond_wait() you supply both the condition variable and the mutex you're holding - the o/s will release the mutex before putting your thread to sleep and reacquire it when it's woken back up. Other threads can acquire the class lock, modify the active thread count, call pthread_cond_signal to wake up the waiting thread and then release the class lock. The de-initialize code will wake up holding the lock again and recheck if the condition has been satisfied (i.e., the count is now zero).

FWIW: The way I handle this (when using pthreads) is an atomic stop flag and calling pthread_join() to wait for each thread to exit. No mutex's are directly involved, partly because my classes and threads are constructed to not require a class level lock on shutdown. I'm skipping over the details on how I get the threads to check the stop flag - that varies widely based on what the thread is for and may involve their own locks.

Regardless, I think that if you can, join is the way to go because it is simple. I don't know of any threading library in any language that doesn't support blocking on a thread until it exits. In unmanaged languages, if you can join, it's often required to so to avoid leaking resources. The join call does the per thread resource clean up. With pthreads, after you call pthread_detach() you can't join and the exiting thread then does its own clean up.

  • Related