When is ok to break up atomic Read-Modify-Write operations into constituent relaxed operations bar-CodePudding

In the model-checked implementation of the CL-Deque, they use the following strategy to decrement the bottom pointer:

size_t b = atomic_load_explicit(&q->bottom, memory_order_relaxed) - 1;
Array *a = (Array *) atomic_load_explicit(&q->array, memory_order_relaxed);
atomic_store_explicit(&q->bottom, b, memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);

So they load the bottom pointer, decrement it locally, then store it. Why is it valid to do this? Wouldn't a concurrent thief be able to see either one of the bottom values?

An alternative way to perform this would be to combine the read-modify-write operation into a single atomic_fetch_sub, like so:

Array *a = (Array *) atomic_load_explicit(&q->array, memory_order_relaxed);
size_t b = atomic_fetch_sub_explicit(&q->bottom, 1, memory_order_seq_cst) - 1;

This would remove that possible race condition.

I think that the broken-up version is valid because of the CAS inside of the steal and take functions resolves this race later, only if the deque is small enough for it to matter. But is this a general technique?

CodePudding user response：

It's a single producer/multi-consumer queue, where the producer uses push/take operations and consumers use the "steal" operation. The producer is the only one who modifies the "bottom" variable. Therefore, using atomic_fetch_sub_explicit is not necessary, because the consumer (thief) will see either the value before or after the store/sub operation in any case. The important aspect is the transaction semantics, which, as you pointed out, are concluded with a CAS operation on both ends.

Here is the original paper.

CodePudding user response：

A barrier after all the relaxed operations doesn't make them like a seq_cst operation; it won't have release semantics wrt. earlier loads/stores because there's no barrier separating them from it. If there's a later relaxed store after that, you could make it seq_cst, or if you're not sure of everything that's being ordered, keep the separate fence; fences are stronger than operations.

Breaking up a or -- (or other RMW) is safe when you're the only thread that could be concurrently writing this variable. (Ever, or right now because we hold a lock or otherwise have exclusive ownership). e.g. in a SeqLock or single-producer queue.

Readers are fine, that's why you're using std::atomic. But by definition, readers can't step on our atomic-load/modify/atomic-store sequence. Thus we don't need RMW atomicity, only atomicity of the store itself, which is all that reader threads will be able to see. Whether that's the store side of an RMW or just a pure store doesn't matter.

If this isn't the case, you can't "repair" it later; the damage is potentially already done. You can however design a whole algorithm so that if there was a problem earlier, it can be detected later and you can bail out. That might be what you're talking about, but IDK, I didn't investigate the details of the code you linked, just wanted to answer the general question.