I'm trying to improve my understanding of synchronization primitives in C . I've measured latencies of various concurrent operations, such as:
- For a raw
std::mutex
, the time between.unlock()
and the return of.lock()
- For a
std::condition_variable
, the time between.notify_one()
and the return of.wait()
- For a
std::binary_semaphore
, the time between.release()
and.acquire()
- For a
std::atomic_flag
, the time from.clear()
and.notify_one()
to.wait()
as well as from.test_and_set()
and.notify_one()
to.wait()
All of these latencies are identical (~4µs-15µs). After digging a bit I found that semaphores are implemented with an atomic, and condition_variables boil down to a mutex. So it boils down to atomics vs mutex. When stepping into the relevant functions (on windows/MSVC), I found that atomics use WaitOnAddress
/WakeByAddress
while mutex uses SRW locks (AcquireSRWLockExclusive
).
Naively I would have assumed atomics (especially atomic_flag) to have the best latency characteristics of all since they're so limited in what they do. So my questions:
- Why are they equally fast? Might be my limited testing.
- What are the differences between
WaitOnAddress
/WakeByAddress
and SRW locks? They're both limited to a single process I think. I only found this article onWaitOnAddress
, but it barely touches on differences to SRW locks.
CodePudding user response:
In an uncontended state an SRWLock is just an atomic int
. So it's unsurprising that it performs the same as an atomic
.
You should see very different timings once you introduce some contention in your tests.
An SRWLock is essentially a Windows version of a futex - a synchronization primitive that remains fully in user-space until a contention occurs.
WaitOnAddress
/ WakeByAddress
and SRWLock are very similar internally, but the use-case is different - SRWLock directly implements a mutex
/shared_mutex
and WaitOnAddress
/ WakeByAddress
are more useful for condition_variable
.