a. Does accessing a memory location with a mutex lock mean that whatever the critical code is doing to the mutexed-variables will end up into the main memory, and not only updated inside the thread's cache or registers without a fresh copy of values in the main memory?
b. If that's the case, aren't we effectively running the critical core as if we didn't have a cache (at least no cache locations for mutex-lock variables)?
c. And if that is the case then isn't the critical code a heavy weight code, and needs to be as small as possible, considering the continued need to read from and write into the main memory at least at the beginning and end of the mutex-locking session?
CodePudding user response:
a. Does accessing a memory location with a mutex lock mean that whatever the critical code is doing to the mutexed-variables will end up into the main memory, and not only updated inside the thread's cache or registers without a fresh copy of values in the main memory?
A correctly implemented mutex guarantees that previous writes are visible to other agents (e.g. other CPUs) when the mutex is released. On systems with cache coherency (e.g. 80x86) modifications are visible when they're in a cache and it doesn't matter if modifications have reached main memory.
Essentially (over-simplified), for cache coherency, when the other CPU wants the modified data it broadcasts a request (like "Hey, I want the data at address 123456"), and if its in another CPUs' cache the other CPU responds with "Here's the data you wanted", and if the data isn't in any cache the memory controller responds with "Here's the data you wanted"; and the CPU gets the most recent version of the data regardless of where the data was or what responds to the request. In practice it's a lot more complex - I'd recommend reading about the MESI cache control protocol if you're interested ( https://en.wikipedia.org/wiki/MESI_protocol ).
b. If that's the case, aren't we effectively running the critical core as if we didn't have a cache (at least no cache locations for mutex-lock variables)?
If it is the case (e.g. if there's no cache coherency); something (the code to release a mutex) would have to ensure that modified data is written back to RAM before the mutex can be acquired by something else. This doesn't prevent the cache from being used inside the critical section (e.g. critical section could write to cache, and then the modified data can be sent from cache to RAM after).
The cost would depend on various factors (CPU speed, cache speed and memory speed, and whether the cache is "write back" or "write through", and how much data is modified). For some cases (relatively slow CPU with write-through caches) the cost may be almost nothing.
c. And if that is the case then isn't the critical code a heavy weight code, and needs to be as small as possible, considering the continued need to read from and write into the main memory at least at the beginning and end of the mutex-locking session?
It's not as heavy as not using caches.
Synchronizing access (regardless of how its done) is always going to be more expensive than not synchronizing access (and crashing because all your data got messed up). ;-)
One of the challenges of multi-threaded code is finding a good compromise between the cost of synchronization and parallelism - a small number of locks (or a single global lock) reduces the cost of synchronization but limits parallelism (threads getting nothing done waiting to acquire a lock); and a large number of locks increases the cost of synchronization (e.g. acquiring more locks is more expensive than acquiring one) but allows more parallelism.
Of course parallelism is also limited by the number of CPUs you have; which means that a good compromise for one system (with few CPUs) may not be a good compromise on another system (with lots of CPUs).
CodePudding user response:
a. Does accessing a memory location with a mutex lock mean that whatever the critical code is doing to the mutexed-variables will end up into the main memory, and not only updated inside the thread's cache or registers without a fresh copy of values in the main memory?
Variables stored in a critical section are not guaranteed to lie in the main memory (eg. RAM). They can lie in the CPU cache (and likely do). They can also be stored in register if the variable is local or more generally if it is ensured not to be visible from other threads (ie. not shared). However, they cannot be stored in a register if they are shared with other threads. In that case, locks act as memory barrier. This is enough so that other threads will see the updated content (thanks to cache coherence). Please note that not all processors actually need a (slow) memory barrier instruction regarding their memory consistency model (for example x86/x86-64 do not need that as opposed to ARM or PowerPC processors).
If you want a variable to be stored in the memory hierarchy, you probably need to use volatile variables. Such variable will not be stored in register but may still not be stored in the main memory. You generally do not need that unless you want to read the variable from a debugger or if you want to interact with low-level devices using a hardware-mapped memory.
c. And if that is the case then isn't the critical code a heavy weight code, and needs to be as small as possible, considering the continued need to read from and write into the main memory at least at the beginning and end of the mutex-locking session?
Well, in any case, critical sections need to be as short as possible so the application can properly scale (due to the Amdahl's law). The final barrier can be expensive but the memory load/store are generally not so much if there is no cache miss, since they can often be done in parallel.