Home > Software engineering >  Do I understand the semantics of std::memory_order correctly?
Do I understand the semantics of std::memory_order correctly?

Time:10-03

c reference.com states about memory_order::seq_cst:

A load operation with this memory order performs an acquire operation, a store performs a release operation, and read-modify-write performs both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order.

[ Q1 ]: Does this mean that the order goes straight down through every operation of all (others this) atomic_vars with memory_order::seq_cst?

[ Q2 ]: And release , acquire and rel_acq are not included in "single total order" ?

I understood that seq_cst is equivalent to the other three with write, read and write_read operation, but I'm confused about whether seq_cst can order other atomic_vars too, not only the same var.

CodePudding user response:

cppreference is only a summary of the C standard, and sometimes its text is less precise. The actual standard draft makes it clear: The final C 20 working draft N4681 states in atomics.order, par. 4 (p. 1525):

There is a single total order S on all memory_order::seq_cst operations, including fences, that satisfies the following constraints [...]

This clearly says all seq_cst operations, not just all operations on a particular object.

And notes 6 and 7 further down emphasize that the order does not apply to weaker memory orders:

6 [Note: We do not require that S be consistent with “happens before” (6.9.2.1). This allows more efficient implementation of memory_order::acquire and memory_order::release on some machine architectures. It can produce surprising results when these are mixed with memory_order::seq_cst accesses. — end note]

7 [Note: memory_order::seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order::seq_cst atomic operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In many cases, memory_order::seq_cst atomic operation

CodePudding user response:

I find this part a bit misleading:

A load operation with this memory order performs an acquire operation, a store performs a release operation, and read-modify-write performs both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order.

Let's have a look at the following counter example:

CPU1:
   a = 1 // release store
   int r1 = b // acquire load

Then based on the above definition for SC, I would presume that the store of a and the load of b can't be reordered:

  • we have a release-store and an acquire-load
  • we (can) have a total order over all loads/stores

So we have satisfied the above definition for sequential consistency.

But a release-store followed by an acquire-load to a different address can be reordered. The canonical example would be Dekker's algorithm. Therefor the above definition for SC is broken because it is missing that memory order needs to preserve the program order. Apart from a compiler messing things up, the typical cause of this violation would be store buffers which most modern CPUs have can cause an older store to be reordered with a newer load to a different address.

The single total order is a different concern than CPU local instruction reordering as you can get with e.g. store buffers. It effectively means that there is some moment where an operation takes effect in the memory order and nobody should be able to disagree with that. The standard litmus test for this is the independent reads of independent writes (IRIW):

CPU1:
   A=1
CPU2:
   B=1
CPU3:
   r1=A
   [LoadLoad]
   r2=B
CPU4:
   r3=B
   [LoadLoad]
   r4=A

So could it be that CPU3 and CPU4 see the stores to different addresses in different orders? If the answer is yes, then no total order over the load/stores exist.

Another cause of a not having a total order over the loads/stores is store to load forwarding (STLF).

CPU1:
   A=1
   r1=A
   r2=B

CPU2:
   B=1
   r3=B
   r4=A

It is possible that r1=1, r2=0, r3=1 and r4=0?

On the X86 this is possible due to store to load forwarding. So if CPU1 does a store of A followed by a load of A, then the CPU must look in the store buffer for value of A. This causes the store of A not to be atomic; the local CPU can see the store early and the consequence is no total order over the loads/stores exist.

So instead of having a total order over all load/stores, it is reduced to a total order over the stores and this is how the X86 gets its name for its memory model (Total Store Order).

[edit] Made some clarifications. I cleaned up some text and cleaned up the original example because it was misleading.

  • Related