This question is inspired by Lock-free Progress Guarantees. The code shown is not strictly lock-free. Whenever a writer thread is suspended when the queue is not empty or not full, the reader threads returned false, preventing the whole data structure from making progress.
What should be the correct behavior for a truly lock-free ring buffer?
Generally, truly lock-free algorithms involve a phase where a pre-empted thread actually tries to ASSIST the other thread in completing an operation.
Any reference to this technique? How could it be implemented for an array-based MPMC queue?
I looked at some codes, and they have similar problems.
- craflin's version uses two internal atomics for recording read and write.
- https://codereview.stackexchange.com/questions/263157/c-lock-free-mpmc-ring-buffer-in-c20 uses busy-waiting, not lock-free.
CodePudding user response:
As a good example of how cross-thread assist often ends up working in real life, consider that a lock-free MPMC queue can be obtained by changing the liblfds
algorithm along these lines:
Use 3 counters:
alloc_pos
: the total number of push operations that have been started. This is incremented atomically when a push starts.write_pos
: all write operations at lower positions are known to be complete.read_pos
: all items written at lower positions are known to have been consumed.
In this scheme, a push or pop operation is completed by a CAS in the affected slot. The write_pos
and read_pos
variables are redundant.
So to push, a thread first increments alloc_pos
, and then increments write_pos
past all the slots in front of it that it can see are complete. This is an assist -- it is completing previous writes that were started in other threads. Then the thread has to scan the slots between write_pos
and alloc_pos
until it finds a free one and manages to reserve it with a CAS.
To pop, the reader first increments read_pos
past all the items written at lower positions that it can see are consumed. Again this is an assist -- completing previous reads. Then it scans from read_pos
to alloc_pos
to see if it can find an item that is finished being written.
As mentioned in comments, doing this for real gets annoying, with implementation decisions trading off performance against which ordering and availability guarantees you need, as well as jumping through hoops to prevent ABA problems.