How does mixing relaxed and acquire/release accesses on the same atomic variable affect synchronises-CodePudding

I have a question about the definition of the synchronises-with relation in the C memory model when relaxed and acquire/release accesses are mixed on one and the same atomic variable. Consider the following example consisting of a global initialiser and three threads:

int x = 0;
std::atomic<int> atm(0);

[thread T1]
x = 42;
atm.store(1, std::memory_order_release);

[thread T2]
if (atm.load(std::memory_order_relaxed) == 1)
    atm.store(2, std::memory_order_relaxed);

[thread T3]
int value = atm.load(std::memory_order_acquire);
assert(value != 1 || x == 42);  // Hopefully this is guaranteed to hold.
assert(value != 2 || x == 42);  // Does this assert hold necessarily??

My question is whether the second assert in T3 can fail under the C memory model. Note that the answer to this SO question suggests that the assert could not fail if T2 used load/acquire and store/release; please correct me if I got this wrong. However, as stated above, the answer seems to depend on how exactly the synchronises-with relation is defined in this case. I was confused by the text on cppreference, and I came up with the following two possible readings.

The second assert fails. The store to atm in T1 could be conceptually understood as storing 1_release where _release is annotation specifying how the value was stored; along the same lines, the store in T2 could be understood as storing 2_relaxed. Hence, if the load in T3 returns 2, the thread actually read 2_relaxed; thus, the load in T3 does not synchronise-with the store in T1 and there is no guarantee that T3 sees x == 42. However, if the load in T3 returns 1, then 1_release was read, and therefore the load in T3 synchronises-with the store in T1 and T3 is guaranteed to see x == 42.
The second assert success. If the load in T3 returns 2, then this load reads a side-effect of the relaxed store in T2; however, this store of T2 is present in the modification order of atm only if the modification order of atm contains a preceding store with a release semantics. Therefore, the load/acquire in T3 synchronises-with the store/release of T1 because the latter necessarily precedes the former in the modification order of atm.

At first glance, the answer to this SO question seems to suggest that my reading 1 is correct. However, that answer seems to be different in a subtle way: all stores in the answer are release, and the crux of the question is to see that load/acquire and store/release establishes synchronises-with between a pair of threads. In contrast, my question is about how exactly synchronises-with is defined when memory orders are heterogeneous.

I actually hope that reading 2 is correct since this would make reasoning about concurrency easier. Thread T2 does not read or write any memory other than atm; therefore, T2 itself has no synchronisation requirements and should therefore be able to use relaxed memory order. In contrast, T1 publishes x and T3 consumes it -- that is, these two threads communicate with each other so they should clearly use acquire/release semantics. In other words, if interpretation 1 turns out to be correct, then the code T2 cannot be written by thinking only about what T2 does; rather, the code of T2 needs to know that it should not "disturb" synchronisation between T1 and T3.

In any case, knowing what exactly is sanctioned by the standard in this case seems absolutely crucial to me.

CodePudding user response：

Because you use relaxed ordering on a separate load & store in T2, the release sequence is broken and the second assert can trigger (although not on a TSO platform such as X86).
You can fix this by either using acq/rel ordering in thread T2 (as you suggested) or by modifying T2 to use an atomic read-modify-write operation (RMW), like this:

[Thread T2]
int ret;
do {
    int val = 1;
    ret = atm.compare_exchange_weak(val, 2, std::memory_order_relaxed);
} while (ret != 0);

The modification order of atm is 0-1-2 and T3 will pick up on either 1 or 2 and no assert can fail.

Another valid implementation of T2 is:

[thread T2]
if (atm.load(std::memory_order_relaxed) == 1)
{
    atm.exchange(2, std::memory_order_relaxed);
}

Here the RMW itself is unconditional and it must be accompanied by an if-statement & (relaxed) load to ensure that the modification order of atm is 0-1 or 0-1-2
Without the if-statement, the modification order could be 0-2 which can cause the assert to fail. (This works because we know there is only one other write in the whole rest of the program. Separate if() / exchange is of course not in general equivalent to compare_exchange_strong.)

In the C standard, the following quotes are related:

[intro.races]
A release sequence headed by a release operation A on an atomic object M is a maximal contiguous subsequence of side effects in the modification order of M, where the first operation is A, and every subsequent operation is an atomic read-modify-write operation.

[atomics.order]
An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

this question is about why an RMW works in a release sequence.