Home > Software engineering >  Does compiler need to care about other threads during optimizations?
Does compiler need to care about other threads during optimizations?

Time:06-08

This is a spin-off from a discussion about C# thread safety guarantees.

I had the following presupposition:

in absence of thread-aware primitives (mutexes, std::atomic* etc., let's exclude volatile as well for simplicity) a valid C compiler may do any kinds of transformations, including introducing reads from the memory (or e. g. writes if it wants to), if the semantics of the code in the current thread (that is, output and [excluded in this question] volatile accesses) remain the same from the current thread's point of view, that is, disregarding existence of other threads. The fact that introducing reads/writes may change other thread's behavior (e. g. because the other threads read the data without proper synchronization or performing other kinds of UB) can be totally ignored by a standard-conform compiler.

Is this presupposition correct or not? I would expect this to follow from the as-if rule. (I believe it is, but other people seem to disagree with me.) If possible, please include the appropriate normative references.

CodePudding user response:

Yes, C defines data race UB as potentially-concurrent access to non-atomic objects when not all the accesses are reads. Another recent Q&A quotes the standard, including.

[intro.races]/2 - Two expression evaluations conflict if one of them modifies a memory location ... and the other one reads or modifies the same memory location.

[intro.races]/21 ... The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, ...

Any such data race results in undefined behavior.


That gives the compiler freedom to optimize code in ways that preserve the behaviour of the thread executing a function, but not what other threads (or a debugger) might see if they go looking at things they're not supposed to. (i.e. data race UB means that the order of reading/writing non-atomic variables is not part of the observable behaviour an optimizer has to preserve.)

introducing reads/writes may change other thread's behavior

The as-if rule allows you to invent reads, but no you can't invent writes to objects this thread didn't already write. That's why if(a[i] > 10) a[i] = 10; is different from a[i] = a[i]>10 ? 10 : a[i].

It's legal for two different threads to write a[1] and a[2] at the same time, and one thread loading a[0..3] and then storing back some modified and some unmodified elements could step on the store by the thread that wrote a[2].

Crash with icc: can the compiler invent writes where none existed in the abstract machine? is a detailed look at a compiler bug where ICC did that when auto-vectorizing with SIMD blends. Including links to Herb Sutter's atomic weapons talk where he discusses the fact that compilers must not invent writes.

By contrast, AVX-512 masking and AVX vmaskmovps etc, like ARM SVE and RISC-V vector extensions I think, do have proper masking with fault suppression to actually not store at all to some SIMD elements, without branching.


It's legal to invent atomic RMWs (except without the Modify part), e.g. an 8-byte lock cmpxchg [rcx], rdx if you want to modify some of the bytes in that region. But in practice that's more costly than just storing modified bytes individually so compilers don't do that.


Of course a function that does unconditionally write a[2] can write it multiple times, and with different temporary values before eventually updating it to the final value. (Probably only a Deathstation 9000 would invent different-valued temporary contents, like turning a[2] = 3 into a[2] = 2; a[2] ;)

For more about what compilers can legally do, see Who's afraid of a big bad optimizing compiler? on LWN. The context for that article is Linux kernel development, where they rely on GCC to go beyond the ISO C standard and actually behave in sane ways that make it possible to roll their own atomics with volatile int* and inline asm. It explains many of the practical dangers of reading or writing a non-atomic shared variable.

  • Related