Lock-free stack with freelist: why don't the next pointers need to be atomic?-CodePudding

A lock-free stack can be implemented as a singly linked list. This seems simple until we have to think about what to do with nodes after they've been popped. One strategy is to simply move them to a per-stack LIFO freelist (from which nodes can be reused by subsequent push operations) until eventually all threads are done with the stack, at which point a single thread destroys all nodes in the stack and all nodes in the freelist. Boost.Lockfree uses this strategy. So does Chris Wellons's C11 implementation. I will refer to the latter, because it's easier to read and the details are essentially the same, since C11 atomics are very similar to C 11 atomics.

In Wellons's implementation, which can be found on GitHub here, all lstack_node objects are non-atomic. In particular, this means that all accesses to the next member of an lstack_node object are non-atomic. What I am unable to understand is: why is it that such accesses never race with each other?

The next member is read at lstack.c:30. It is written at lstack.c:39. If these two lines can execute concurrently on the same lstack_node object, then the program contains a race. Is this possible? It seems possible to me:

Thread 1 calls lstack_pop, which calls pop. It atomically loads the head node's value into the local variable orig. Now, orig.node is a pointer to the node that was at the top of the stack just now. (Note that up until this point, only local variables have been modified, so it is impossible for anything that thread 1 has done so far to make a CAS fail in any other thread.) Meanwhile...
Thread 2 calls lstack_pop. pop succeeds and returns node, a pointer to the node that has just been excised from the stack; this is the same node that orig.node points to in thread 1. It then begins to call push in order to add node to the freelist. The freelist head node is atomically loaded, and node->next is set to point to the first node in the freelist.
Oops. This races with the read to orig.node->next in thread 1.

Could Wellons's implementation simply be wrong? I doubt it. If his implementation is wrong, then so is the Boost one, because the only way to fix (what appears to me to be) the race condition is to make next atomic. But I don't think the Boost implementation could be wrong in such a basic way without it having been noticed and fixed by now. So I must have made a mistake in my reasoning.

CodePudding user response：

I just wrote a long text trying to explain why there cannot be a race, until I took a closer look at how Wellson implemented the free list, and I came to the conclusion that you are correct!

The important point here is what you mentioned at the very beginning of your question:

This seems simple until we have to think about what to do with nodes after they've been popped. One strategy is to simply move them to a per-stack LIFO freelist until eventually all threads are done with the stack, at which point a single thread destroys all nodes in the stack and all nodes in the freelist.

But that is not how the freelist in Wellson's implementation works! Instead it tries to reuse nodes from the freelist, but then next also needs to be atomic as you observed correctly. If the free list would have been implemented as you described, i.e., the popped nodes would be added to some freelist (unaltered, i.e., the freelist uses a different pointer than next) and only released once the stack is no longer used by any thread, then next could be a plain variable since it would not be change once the node has been pushed successfully.

That does not necessarily mean that the boost lock-free queue is also incorrect, but I don't know the code good enough to make a qualified statement about the boost implementation.

FWIW - this is usually referred as the memory reclamation problem. This freelist approach is a simple solution, though usually not practical for real-world scenarios. For a real-world scenario you probably want to use a memory reclamation scheme like hazard pointers or epoch based reclamation. You can take a look at my xenium library where I have implemented several different reclamation schemes as well as lock-free data structures that use them. More information about the memory reclamation problem and my implementations in xenium can be also found in my thesis Effective memory reclamation for lock-free data structures in C .

CodePudding user response：

The key thing to note is that the next fields are read-only for every node that is currently in a linked list. The next can only be modified when the node has been successfully removed from a linked list. Once that happens, the thread that removed it 'owns' it and noone else can sensibly look at it (they might read a value, but that value will be thrown away when their compare_and_set fails.) So that owning thread can safely modify the next field as part of pushing it on another list.

In your hypothetical, you're missing the fact that the two pops (done by the two threads) cannot both succeed and return the same node. If two threads try to simultaneously pop, they might get the same node pointer, but one will fail in the compare_and_set atomic instruction and will loop back with a different node pointer.

This does require that read/write races are "safe" -- that is when you have a read/write race between two threads, the reader might get any value but won't otherwise fail (no trap or other undefined behavior), and won't otherwise interfere with the write, but that tends to be the case on most (all?) hardware. As long as the reader does not depend on the value read during a race, you can detect the race after the fact and disregard that value.