Brief Description:
The code below is an implementation of a lock-free queue from C Concurrency in Action 2nd Edition. The queue uses a linked list as the underlying data structure and is implemented with split reference counting technology.
What confuses me is that there are multiple compare_exchange_strong
in the code that use the acquire
memory order, but I don't understand why this order is used.
For example, among all accesses to the count
member of a node
object (see below), there are no store operations with release
order, but there are many compare_exchange_strong
operations with acquire
order.
For the complete code for this lock-free queue, my explanation of the code, and a detailed description of my problem, please read the full description section.
Full description:
I am reading the second edition of C Concurrency in Action by Anthony Williams. The book describes how to implement a lock-free queue using split reference counting technique. I'll start by briefly explaining how the code works based on my understanding to help you read the code quickly. The full implementation of this queue will be given later.
The implementation uses a singly linked list to implement the queue. The queue holds pointers to the head node and tail node of the linked list, which are head
and tail
in the code respectively, and tail
points to a dummy node.
When pushing an element into the queue, we need to put the element value in the dummy node pointed to by tail
, then add a dummy node after the node, and let tail
point to the new dummy node.
When popping an element from the queue, we should pop the element pointed to by head
, and then set head
to head->next
. When head
and tail
point to the same node (i.e. a dummy node), the queue is empty.
The code uses reference counting to manage the lifetime of deleted nodes. The reference count associated with a node is divided into two parts, the external reference count and the internal reference count. The external count plus the internal count is the full reference count value for that node. The external count is stored in the pointer to the node (i.e. counted_node_ptr
in the code), while the internal count is stored inside the node object (i.e. count
in the code).
In order to prevent the object pointed to by a thread from being deleted by another thread before dereferencing the pointer, the external counter must be incremented first to ensure that the node is not deleted. This is done by increase_external_count()
in the code.
When an external reference no longer refers to a node, the external count stored in it must be added to the node's internal count, which is done by free_external_counter()
. The internal count is stored in count.internal_count
in the node object. And count.external_counters
represents the number of external references referencing this node. Because these external references each hold their own external counts, these counts must all be taken into account. The node can only be safely deleted if and only if the internal count is 0 and the number of the external counter is also 0.
For references in pop
that did not successfully point to the node being popped (usually because another thread has already popped it), those references are simply discarded, but their reference counts must be subtracted. This is done by release_ref()
. This function decrements the internal count of the node by one. Every time the reference count is decremented, or the inner count and the external count are merged, a check is made to see if the condition for the node to be deleted is met.
Here is the complete implementation of the lock-free queue in the book:
template<typename T>
class lock_free_queue
{
private:
struct node;
struct counted_node_ptr
{
int external_count;
node* ptr;
};
std::atomic<counted_node_ptr> head;
std::atomic<counted_node_ptr> tail;
struct node_counter
{
unsigned internal_count:30;
unsigned external_counters:2;
};
struct node
{
std::atomic<T*> data;
std::atomic<node_counter> count;
counted_node_ptr next;
node()
{
node_counter new_count;
new_count.internal_count=0;
new_count.external_counters=2;
count.store(new_count);
next.ptr=nullptr;
next.external_count=0;
data.store(nullptr);
}
void release_ref()
{
node_counter old_counter=
count.load(std::memory_order_relaxed);
node_counter new_counter;
do
{
new_counter=old_counter;
--new_counter.internal_count;
}
while(!count.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed));
if(!new_counter.internal_count &&
!new_counter.external_counters)
{
delete this;
}
}
};
static void increase_external_count(
std::atomic<counted_node_ptr>& counter,
counted_node_ptr& old_counter)
{
counted_node_ptr new_counter;
do
{
new_counter=old_counter;
new_counter.external_count;
}
while(!counter.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed));
old_counter.external_count=new_counter.external_count;
}
static void free_external_counter(counted_node_ptr &old_node_ptr)
{
node* const ptr=old_node_ptr.ptr;
int const count_increase=old_node_ptr.external_count-2;
node_counter old_counter=
ptr->count.load(std::memory_order_relaxed);
node_counter new_counter;
do
{
new_counter=old_counter;
--new_counter.external_counters;
new_counter.internal_count =count_increase;
}
while(!ptr->count.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed));
if(!new_counter.internal_count &&
!new_counter.external_counters)
{
delete ptr;
}
}
public:
lock_free_queue() {
counted_node_ptr h;
h.ptr = new node;
h.external_count = 1;
head.store(h);
tail.store(h);
}
lock_free_queue(const lock_free_queue&) = delete;
lock_free_queue& operator=(const lock_free_queue&) = delete;
~lock_free_queue() {
while (pop())
;
}
void push(T new_value)
{
std::unique_ptr<T> new_data(new T(new_value));
counted_node_ptr new_next;
new_next.ptr=new node;
new_next.external_count=1;
counted_node_ptr old_tail=tail.load();
for(;;)
{
increase_external_count(tail,old_tail);
T* old_data=nullptr;
if(old_tail.ptr->data.compare_exchange_strong(
old_data,new_data.get()))
{
old_tail.ptr->next=new_next;
old_tail=tail.exchange(new_next);
free_external_counter(old_tail);
new_data.release();
break;
}
old_tail.ptr->release_ref();
}
}
std::unique_ptr<T> pop()
{
counted_node_ptr old_head=head.load(std::memory_order_relaxed);
for(;;)
{
increase_external_count(head,old_head);
node* const ptr=old_head.ptr;
if(ptr==tail.load().ptr)
{
ptr->release_ref();
return std::unique_ptr<T>();
}
if(head.compare_exchange_strong(old_head,ptr->next))
{
T* const res=ptr->data.exchange(nullptr);
free_external_counter(old_head);
return std::unique_ptr<T>(res);
}
ptr->release_ref();
}
}
};
I think I fully understand how push
and pop
work and think they are correct. But what I don't understand is why some atomic operations in the code use that memory order, and the book doesn't give a reason.
The count
member of the node
object stores the node's internal count value, as well as the number of external references to the node. The count
member is used only a few times in the code:
- In the constructor of
node
, there is a seq-cststore
forcount
:
node()
{
node_counter new_count;
new_count.internal_count=0;
new_count.external_counters=2;
count.store(new_count); // seq-cst store
next.ptr=nullptr;
next.external_count=0;
}
- In
release_ref
, there arerelaxed
load
andcompare_exchange_strong
forcount
:
void release_ref()
{
node_counter old_counter=
count.load(std::memory_order_relaxed); // relaxed load
node_counter new_counter;
do
{
new_counter=old_counter;
--new_counter.internal_count;
}
while(!count.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed)); // acquire RMW operation
// ...
}
- In
free_external_counter
, there are alsorelaxed
load
andacquire
compare_exchange_strong
:
static void free_external_counter(counted_node_ptr &old_node_ptr)
{
node* const ptr=old_node_ptr.ptr;
int const count_increase=old_node_ptr.external_count-2;
node_counter old_counter=
ptr->count.load(std::memory_order_relaxed); // relaxed load
node_counter new_counter;
do
{
new_counter=old_counter;
--new_counter.external_counters;
new_counter.internal_count =count_increase;
}
while(!ptr->count.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed)); // acquire RMW
// ...
}
It can be seen that, except for a seq-cst store performed in the constructor of node
, other operations are load operations with relaxed
or acquire
order, but no store operations with release
order.
So what is the purpose of using acquire
memory order here? What operation should it be synchronized with?
In addition, increase_external_count
also uses acquire
order, what is the reason here? While other atomic operations use the default seq-cst ordering, some of these operations seem to be able to be replaced with weaker memory ordering as well.
CodePudding user response:
Good catch. These are data races. ThreadSanitizer agrees in both cases.
In free_external_counter
:
while(!ptr->count.compare_exchange_strong(
old_counter,new_counter,
std::memory_order_acquire,std::memory_order_relaxed));
if(!new_counter.internal_count &&
!new_counter.external_counters)
{
delete ptr;
}
We read the object *ptr
before the compare_exchange
(to get ptr->count
). But since the compare_exchange
is not release, the read of *ptr
could in theory be reordered after it. If so, then by the time the read of *ptr
happens, the counter has already been decremented, and in the meantime some other thread might have decremented it again and then done delete ptr
.
I am not sure how this could happen in real life, short of having a time machine, because we had to read *ptr
in order to even find the object ptr->count
that we are supposed to be writing to. You'd need some miraculous machine that can speculatively make stores globally visible, and then roll back all other cores if it turns out the wrong address or value was stored. But anyway, the C memory model doesn't forbid it.
There's a similar bug in release_ref
, where the object being raced on is *this
.
I think these can be fixed by upgrading the compare-exchanges to acq_rel
. That should ensure that all accesses to *ptr
happen before its deletion. In fact, as you point out, we only need the acquire ordering in the case that the count has become zero, since it's only the delete ptr
that must be protected from reordering before the compare_exchange
. You could do that by having just release
ordering on the compare_exchange
, and putting an acquire fence inside the if
block, just before delete ptr;
. I have not taken the time to prove this correct, though.
On the other hand, I also think you're right that the compare_exchange
in increase_external_count
could be weakened to relaxed
. When it's called, the counter must already be positive, or else accessing the counter itself is unsafe. So code that follows the compare_exchange would be equally well protected if it were reordered before it. (This is not a proof, of course.)