Home > database >  Acqrel memory order with 3 threads
Acqrel memory order with 3 threads

Time:03-16

Lately the more I read about memory order in C , the more confusing it gets. Hope you can help me clarify this (for purely theoretic purposes). Suppose I have the following code:

std::atomic<int> val = { 0 };

std::atomic<bool> f1 = { false };
std::atomic<bool> f2 = { false };

void thread_1() {
  f1.store(true, std::memory_order_relaxed);

  int v = 0;
  while (!val.compare_exchange_weak(v, v | 1, 
                                    std::memory_order_release));
}

void thread_2() {
  f2.store(true, std::memory_order_relaxed);

  int v = 0;
  while (!val.compare_exchange_weak(v, v | 2, 
                                    std::memory_order_release));
}

void thread_3() {
  auto v = val.load(std::memory_order_acquire);
  if (v & 1) assert(f1.load(std::memory_order_relaxed));
  if (v & 2) assert(f2.load(std::memory_order_relaxed));
}

The question is: can any of the assertions be false? On one hand, cppreference claims, std::memory_order_release forbids the reordering of both stores after exchanges in threads 1-2 and std::memory_order_acquire in thread 3 forbids both reads to be reordered before the first load. Thus, if thread 3 saw the first or the second bit set that means that the store to the corresponding boolean already happened and it has to be true.

On the other hand, thread 3 synchronizes with whoever released the value it has acquired from val. Can it happen so (in theory if not in practice) that thread 3 "acquired" the exchange "1 -> 3" by thread 2 (and therefore f2 load returns true), but not the "0 -> 1" by thread 1 (thus the first assertion fires)? This possibility makes no sense to me considering the "reordering" understanding, yet I can't find any confirmation that this cannot happen anywhere.

CodePudding user response:

Neither assertion can ever fail, thanks to ISO C 's "release sequence" rules. This is the formalism that provides the guarantee you assumed must exist in your last paragraph.

The only stores to val are release-stores with the appropriate bits set, done after the corresponding store to f1 or f2. So if thread_3 sees a value with 1 bit set, it has definitely synchronized-with the writer that set the corresponding variable.

And crucially, they're each part of an RMW, and thus form a release-sequence that lets the acquire load in thread_3 synchronize-with both CAS writes, if it happens to see val == 3.

(Even a relaxed RMW can be part of a release-sequence, although in that case there wouldn't be a happens-before guarantee for stuff before the relaxed RMW, only for other release operations by this or other threads on this atomic variable. If thread_2 had used mo_relaxed, the assert on f2 could fail, but it still couldn't break things so the assert on f1 could ever fail. See also What does "release sequence" mean? and https://en.cppreference.com/w/cpp/atomic/memory_order)


If it helps, I think those CAS loops are fully equivalent to val.fetch_or(1, release). Definitely that's how a compiler would implement fetch_or on a machine with CAS but not an atomic OR primitive. IIRC, in the ISO C model, CAS failure is only a load, not an RMW. Not that it matters; a relaxed no-op RMW would still propagate a release-sequence.

(Fun fact: x86 asm lock cmpxchg is always a real RMW, even on failure, at least on paper. But it's also a full barrier, so basically irrelevant to any reasoning about weakly-ordered RMWs.)

  • Related