Interlocked & Thread-Safe operations-CodePudding

1. Out of curiosity, what does operations like the following do behind the scenes when they get called for example from 2 or 3 threads at the same time?

Interlocked.Add(ref myInt, 24);

Interlocked.Increment(ref counter);

Does the C# creates an inside queue that tells for example Thread 2, now it's your turn to do the operation, then it tells Thread 1 now it's your turn, and then Thread 3 you do the operation? So that they will not interfere with each other?

2. Why doesn't the C# do this process automatically?

Isn't it obvious that when a programmer write something like the following inside a multi-thread method:

myByte  ;

Sum = int1   int2   int3;

and this variables are Shared with other threads, that he wants each of this operations do be executed as an Atomic operation without interruptions?

Why does the programmer have to tell it Explicitly to do so?

Isn't it clear that that's what every programmer wants? Aren't this "Interlocked" methods just add unnecessary complication to the language?

Thanks.

CodePudding user response：

what does operations like the following do behind the scenes

As far as how it's implemented internally, CPU hardware arbitrates which core gets ownership of the cache line when there's contention. See Can num be atomic for 'int num'? for a C and x86 asm / cpu-architecture explanation of the details.

Re: why CPUs and compilers want to load early and store late:
see Java instruction reordering and CPU memory reordering
Atomic RMW prevents that, so do seq_cst store semantics on most ISAs where you do a plain store and then a full barrier. (AArch64 has a special interaction between stlr and ldar to prevent StoreLoad reordering of seq_cst operations, but still allow reordering with other operations.)

Isn't it obvious that when a programmer write something like the following inside a multi-thread method [...]

What does that even mean? It's not running the same method in multiple threads that's a problem, it's accessing shared data. How is the compiler supposed to know which data will be accessed non-readonly from multiple threads at the same time, not inside a critical section?

There's no reasonable way to prove this in general, only in some simplistic cases. If a compiler were to try, it would have to be conservative, erring on the side of making more things atomic, at a huge cost in performance. The other kind of mistake would be a correctness problem, and if that could just happen when the compiler guesses wrong based on some undocumented heuristics, it would make the language unusable for multi-threaded programs.

Besides that, not all multi-threaded code needs sequential consistency all the time; often acquire/release or relaxed atomics are fine, but sometimes they aren't. It makes a lot of sense for programmers to be explicit about what ordering and atomicity their algorithm is built on.

Also you carefully design lock-free multi-threaded code to do things in a sensible order. In C , you don't have to use Interlock..., but instead you make a variable std::atomic<int> shared_int; for example. (Or use std::atomic_ref<int> to do atomic operations on variables that other code can access non-atomically, like using Interlocked functions).

Having no explicit indication in the source of which operations are atomic with what ordering semantics would make it harder to read and maintain such code. Correct lock-free algorithms don't just happen by having the compiler turn individual operators into atomic ops.

Promoting every operation to atomic would destroy performance. Most data isn't shared, even in functions that access some shared data structures.

Atomic RMW (like x86 lock add [rdi], eax) is much slower than a non-atomic operation, especially since non-atomic lets the compiler optimize variables into registers.

An atomic RMW on x86 is a full memory barrier, so making every operation atomic would destroy memory-level parallelism every time you use a = or .

e.g. one per 18 cycle throughput on Skylake for lock xadd [mem], reg if hot in L1d cache, vs. one per 0.25 cycles for add reg, reg (https://uops.info), not to mention removing opportunities to optimize away and combine operations. And reducing the ability for out-of-order execution to overlap work.

CodePudding user response：

This is a partial answer to you question you asked in the in the comments:

Why not? If I, as a programmer know exactly where I should put this protections, why can't the compiler?

In order for the compiler to do that, it would need to understand all possible execution paths through your program. This is effectively the Path Testing problem discussed here: https://softwareengineering.stackexchange.com/questions/277693/does-path-coverage-guarantee-finding-all-bugs

That article states that this is equivalent to the halting problem, which is computer-science-ese for saying it's an unsolvable problem.

The cool thing is that you want to do this in a world where you have multiple threads of execution running on possibly multiple processors. That makes an unsolvable problem that much harder to solve.

On the other hand, the programmer should know what his/her program does...