Home > Net >  What is the significance of 'strongly happens before' compared to '(simply) happens b
What is the significance of 'strongly happens before' compared to '(simply) happens b

Time:01-05

The standard defines several 'happens before' relations that extend the good old 'sequenced before' over multiple threads:

[intro.races]

11 An evaluation A simply happens before an evaluation B if either

(11.1) — A is sequenced before B, or
(11.2) — A synchronizes with B, or
(11.3) — A simply happens before X and X simply happens before B.

[Note 10: In the absence of consume operations, the happens before and simply happens before relations are identical. — end note]

12 An evaluation A strongly happens before an evaluation D if, either

(12.1) — A is sequenced before D, or
(12.2) — A synchronizes with D, and both A and D are sequentially consistent atomic operations ([atomics.order]), or
(12.3) — there are evaluations B and C such that A is sequenced before B, B simply happens before C, and C is sequenced before D, or
(12.4) — there is an evaluation B such that A strongly happens before B, and B strongly happens before D.

[Note 11: Informally, if A strongly happens before B, then A appears to be evaluated before B in all contexts. Strongly happens before excludes consume operations. — end note]

(bold mine)

The difference between the two seems very subtle. 'Strongly happens before' is never true for matching pairs or release-acquire operations (unless both are seq-cst), but it still respects release-acquire syncronization in a way, since operations sequenced before a release 'strongly happen before' the operations sequenced after the matching acquire.

Why does this difference matter?

'Strongly happens before' was introduced in C 20, and pre-C 20, 'simply happens before' used to be called 'strongly happens before'. Why was it introduced?

[atomics.order]/4 says that the total order of all seq-cst operations is consistent with 'strongly happens before'.

Does it mean that it's not consistent with 'simply happens before'? If so, why not?


I'm ignoring the plain 'happens before', because it differs from 'simply happens before' only in its handling of memory_order_consume, the use of which is temporarily discouraged, since apparently most (all?) major compilers treat it as memory_order_acquire.

I've already seen this Q&A, but it doesn't explain why 'strongly happens before' exists, and doesn't fully address what it means (it just states that it doesn't respect release-acquire syncronization, which isn't completely the case).


Found the proposal that introduced 'simply happens before'.

I don't fully understand it, but it explains following:

  • 'Strongly happens before' is a weakened version of 'simply happens before'.
  • The difference is only observable when seq-cst is mixed with aqc-rel on the same variable (I think, it means when an acquire load reads a value from a seq-cst store, or when an seq-cst load reads a value from a release store). But the exact effects of mixing the two are still unclear to me.

CodePudding user response:

Here's my current understanding, which could be incomplete or incorrect. A verification would be appreciated.


C 20 renamed strongly happens before to simply happens before, and introduced a new, more relaxed definition for strongly happens before, which imposes less ordering.

Simply happens before is used to reason about the presence of data races in your code. (Actually that would be the plain 'happens before', but the two are equivalent in absence of consume operations, the use of which is discouraged by the standard, since most (all?) major compilers treat them as acquires.)

The weaker strongly happens before is used to reason about the global order of seq-cst operations.


This change was introduced in proposal P0668R5: Revising the C memory model, which is based on the paper Repairing Sequential Consistency in C/C 11 by Lahav et al (which I didn't fully read).

The proposal explains why the change was made. Long story short, the way most compilers implement atomics on Power and ARM architectures turned out to be non-conformant in rare edge cases, and fixing the compilers had a performance cost, so they fixed the standard instead.

The change only affects you if you mix seq-cst operations with acquire-release operations on the same atomic variable (i.e. if an acquire operation reads a value from a seq-cst store, or a seq-cst operation reads a value from a release store).

If you don't mix operations in this manner, then you're not affected (i.e. can treat simply happens before and strongly happens before as equivalent).

The gist of the change is that the synchronization between a seq-cst operation and the corresponding acquire/release operation no longer affects the position of this specific seq-cst operation in the global seq-cst order, but the synchronization itself is still there.

This makes the seq-cst order for such seq-cst operations very moot, see below.


The proposal presents following example, and I'll try to explain my understanding of it:

atomic_int x = 0, y = 0;
int a = 0, b = 0, c = 0;
// Thread 1
x.store(1, seq_cst);
y.store(1, release);
// Thread 2
b = y.fetch_add(1, seq_cst); // b = 1 (the value of y before increment)
c = y.load(relaxed); // c = 3
// Thread 3
y.store(3, seq_cst);
a = x.load(seq_cst); // a = 0

The comments indicate one of the ways that this code can execute, which the standard used to forbid (before this change), but which actually can happen on the affected architectures.

The execution proceeds as follows:

.-- T3 y.store(3, seq_cst);                   --.                 (2)
|        |                                      | strongly
|        | sequenced before                     | happens
|        V                                      | before
|   T3 a = x.load(seq_cst); // a = 0    --.   <-'                 (3)
|                                         : coherence-
|                                         : ordered
|                                         : before
|   T1 x.store(1, seq_cst);             <-'   --. --.             (4)
|        |                                      |st |
|        | sequenced before                     |h  |
|        V                                      |b  |
| . T1 y.store(1, release);                   <-'   |
| |      :                                          | strongly
| |      : synchronizes with                        | happens
| |      V                                          | before
| > T2 b = y.fetch_add(1, seq_cst); // b = 1  --.   |             (1)
|        |                                      |st |
|        | sequenced before                     |h  |
|        V                                      |b  |
'-> T2 c = y.load(relaxed); // c = 3          <-' <-'

Where:

  • Parenthesized numbers on the right show the global seq-cst order.

  • Arrows on the left show how the values propagate between some loads and stores.

  • Arrows in the middle show:

    • 'Sequenced before', the good old single-threaded evaluation order.
    • 'Synchronizes with', the release-acquire synchronization (seq-cst loads count as acquire operations, and seq-cst stores count as release operations).

    Those two together comprise 'simply happens before'.

  • Arrows on the right are based on the arrows in the middle, they show:

    • The newly redefined 'strongly happens before' relation.

    • 'Coherence-ordered before', a new relation introduced in this proposal, which is only used to define the global seq-cst order, and apparently imposes no synchronization (unlike release-acquire operations).

      It seems that it includes everything other than 'simply happens before' that affects the global seq-cst order. In this case, it's just common sense that if a load doesn't see the value written by the store, then the load goes before the store.

    The global seq-cst order is consistent with both.

Notice that on this picture, nothing strongly happens before b = y.fetch_add(1, seq_cst);, so there isn't anything that must be before it in the global seq-cst order, so it's possible to move it way up to the beginning of the seq-cst order, which is what ends up happening is this scenario, even though it reads the values produced by later (in this order) operations.

  •  Tags:  
  • Related