Home > Software engineering >  Does libc counting_semaphore have deadlock issue?
Does libc counting_semaphore have deadlock issue?

Time:12-12

libc counting_semaphore::release:

    void release(ptrdiff_t __update = 1)
    {
        if(0 < __a.fetch_add(__update, memory_order_release))
            ;
        else if(__update > 1)
            __a.notify_all();
        else
            __a.notify_one();
    }

Notifies only if internal count was zero before increment, notifies more then one waiter only if increment is more than one.

libc counting_semaphore::acquire:

    void acquire()
    {
        auto const __test_fn = [=]() -> bool {
            auto __old = __a.load(memory_order_relaxed);
            return (__old != 0) && __a.compare_exchange_strong(__old, __old - 1, memory_order_acquire, memory_order_relaxed);
        };
        __cxx_atomic_wait(&__a.__a_, __test_fn);
    }

Waits for count to be non-zero, and tries to CAS it with decremented value.

Now please look into the following 3-threaded case:

counting_semaphore<10> s;

T1: { s.acquire(); /*signal to T3*/ }
T2: { s.acquire(); /*signal to T3*/ }
T3: { /*wait until both signals*/ s.release(1); s.release(1); }

Initially:

__a == 0 

(desired parameter passed as 0, any attempt to acquire would block)

Timeline

T1: enters wait
T2: enters wait
T3: fetch add 1 & returns 0, now __a == 1
T3: (0 < 0) is false, so notify_one happens
T1: unblocks from wait
T3: fetch add 1 & returns 1, now __a == 2
T3: (0 < 1) is true, so no notification
T1: loads 2
T1: cas 2 to 1 successfully
T1: returns from acquire
T2: still waits despite __a == 1

Does this look like a valid deadlock?


why I am asking here instead of reporting the issue?

I reported the issue quite some time ago, no reply so far. I want to understand if there's indeed a deadlock or I am missing something.

CodePudding user response:

The conditional if (0 < ...) is a problem, but it's not the only problem.

The effects of release are stated to be:

Atomically execute counter = update. Then, unblocks any threads that are waiting for counter to be greater than zero.

Note the words "any threads". Plural. This means that, even if a particular update value happened to be 1, all of the threads blocked on this condition must be notified. So calling notify_one is wrong, unless the implementation of notify_one always unblocks all waiting threads. Which would be an... interesting implementation of that function.

Even if you change notify_one to notify_all, that doesn't fix the problem. The logic of the condition basically assumes that thread notification should only happen if all of the threads (logically) notified by a previous release have escaped from their acquire calls. The standard requires no such thing.

  • Related