If atomic_compare_exchange isn't atomic on its own thread, how can it implement a lock?-CodePudding

If I have

std::atomic<uint64_t> guard;
// ...
if (std::atomic_compare_exchange_strong(
        &guard, &kExpected, kValue,
        std::memory_order_acquire, std::memory_order_relaxed)) {
   int foo = *non_atomic_thing;
   // ...
}

I know that the read of non_atomic_thing can't be re-ordered before the read of guard. But is it possible for it to be re-ordered before the write? That is, if some other thread reads guard with std::memory_order_acquire, observes that it's not equal to kValue, and writes to it, is this a data race?

(cppreference.com isn't very clear on this, but the docs for Rust, which has the same semantics, specifically says that acquire only applies to the load part and release only applies to the store: "Using Acquire as success ordering makes the store part of this operation Relaxed")

If this is true, how can I tell other threads not to overwrite my value while I'm reading it?

CodePudding user response：

The C standard is a little vague on this point. But what they probably intended, and what happens in practice on many implementations, is that your non-atomic read can be reordered before the write. See For purposes of ordering, is atomic read-modify-write one operation or two?.

However, for the usual locking idiom, this is not a problem, and does not lead to a data race. In taking a lock, it's tempting to think of the essential operation as the store of the "locked" value, which tells other threads that the lock is ours and they can't have it. But actually, what's important is the load of the "unlocked" value. This proves that the lock belongs to our thread. The atomicity of the read-modify-write operation guarantees that no other thread will load that value until we reset it, and so we can safely enter the critical section, even if our store of the "locked" value doesn't become visible until some time later.

You could imagine this happening on an LL/SC architecture. Suppose the load-link is done and returns the "unlocked" value. The store-conditional hasn't been done yet, and it could fail, in case some other core accessed the lock variable and we lost our exclusive ownership of it. However, in the meantime, we can perform a speculative load of the non-atomic variable, with the understanding that it won't retire unless the SC succeeds. If the SC fails, the LL/SC pair will have to be redone, and in that case our non-atomic load will also be redone. So by the time we finally do get a success, the non-atomic load will be visible after the LL, but possibly before the SC.

To see how this follows from the C memory ordering rules, let's consider a more fleshed-out example:

std::atomic<int> lock{0};
int non_atomic;

void reader() {
    int expected = 0;
    if (lock.compare_exchange_strong(expected, 1,
            std::memory_order_acquire, std::memory_order_relaxed)) {
        int local = non_atomic;
        // ...
        lock.store(0, std::memory_order_release);
    }
}

void writer() {
    int expected = 0;
    if (lock.compare_exchange_strong(expected, 2,
            std::memory_order_acquire, std::memory_order_relaxed)) {
        non_atomic = 42;
        // ...
        lock.store(0, std::memory_order_release);
    }
}

The atomicity promise of a read-modify-write operation comes from C 20 [atomics.order p10]:

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

The modification order of lock is a total order. The critical sections will only be entered if both compare-exchanges read the value 0. The 0 loaded by reader must be immediately followed in the modification order by a 1, and the 0 loaded by writer must be immediately followed by 2. So 0 has to occur twice. The modification order of lock must therefore either be 0 1 0 2 0 or 0 2 0 1 0.

If it's 0 1 0 2 0, then the second 0 must come from the release store in reader. It can't be from the store in writer, because that one must come after 2 in the modification order (because writers store of 0 is sequenced after its store of 2; this is write-write coherency, [intro.races p15]). So the acquire load in writer reads the second 0 in the order, which was put there by the release store in reader, and so they synchronize. This ensures that the access to non_atomic in reader happens before the access in writer, and there is no data race.

If it's 0 2 0 1 0, the logic is reversed, and the access in writer happens before the access in reader; again no data race.