synchronization primitives: Equal latencies between atomics and mutex lock-CodePudding

I'm trying to improve my understanding of synchronization primitives in C . I've measured latencies of various concurrent operations, such as:

For a raw std::mutex, the time between .unlock() and the return of .lock()
For a std::condition_variable, the time between .notify_one() and the return of .wait()
For a std::binary_semaphore, the time between .release() and .acquire()
For a std::atomic_flag, the time from .clear() and .notify_one() to .wait() as well as from .test_and_set() and .notify_one() to .wait()

All of these latencies are identical (~4µs-15µs). After digging a bit I found that semaphores are implemented with an atomic, and condition_variables boil down to a mutex. So it boils down to atomics vs mutex. When stepping into the relevant functions (on windows/MSVC), I found that atomics use WaitOnAddress/WakeByAddress while mutex uses SRW locks (AcquireSRWLockExclusive).

Naively I would have assumed atomics (especially atomic_flag) to have the best latency characteristics of all since they're so limited in what they do. So my questions:

Why are they equally fast? Might be my limited testing.
What are the differences between WaitOnAddress/WakeByAddress and SRW locks? They're both limited to a single process I think. I only found this article on WaitOnAddress, but it barely touches on differences to SRW locks.

CodePudding user response：

In an uncontended state an SRWLock is just an atomic int. So it's unsurprising that it performs the same as an atomic.

You should see very different timings once you introduce some contention in your tests.

An SRWLock is essentially a Windows version of a futex - a synchronization primitive that remains fully in user-space until a contention occurs.

WaitOnAddress / WakeByAddress and SRWLock are very similar internally, but the use-case is different - SRWLock directly implements a mutex/shared_mutex and WaitOnAddress / WakeByAddress are more useful for condition_variable.