I have 2 thread-safe lists (same type), where each node has a mutex (I know that it's not the optimal solution, but that's what I was asked to do). I have the below function, where I want to match 2 nodes, each one from a different list. The logic is that I pass one node from the first list and I check with all the nodes from the second list.

Here is the type of the node and the list:

typedef struct node_type{
    //some data;
    struct node_type *next;
    pthread_mutex_t lock;
}node_t;

typedef struct list_type{
    node_t *head;
    pthread_mutex_t lock;
}list_t;

Here is the code from the function:

node_t *find_match(list_t *l,node_t *n)
{

    node_t *cur,*prev;

    pthread_mutex_lock(&l->lock);
    if(l->head == NULL)
    {
        pthread_mutex_unlock(&l->lock);
        return NULL;
    }
    else
    {
        cur = l->head;
        while(cur != NULL)
        {
            pthread_mutex_lock(&cur->lock);
            if(n != NULL)
            {
                pthread_mutex_lock(&n->lock);
                //comparing the data and checks for match.
            }
            else
            {
                //some message or something
                return NULL;
            }
        if(cur == l->head)
        {
            pthread_mutex_unlock(&l->lock);
        }
        else
        {
            pthread_mutex_unlock(&prev->lock);
            prev = cur;
            cur = cur->next;
        }
        pthread_mutex_unlock(&prev);
        return NULL;
    }
}

I'd like to point your attention at the if statement "n != NULL". There, if n is not NULL, then the thread will try to lock the node's mutex. If another thread has it and tries to delete the node, what would happen? Can I avoid it somehow? Is there a better way to write this?

CodePudding user response：

Question: If another thread has it and tries to delete the node, what would happen?

Very short answer: What happens is undefined and could be disastrous."

Short answer: Using the mutex of a node that is deleted in another thread, leads to undefined behaviour. This is also true when that other thread used the mutex during deletion and independent whether it deletes in a locked or unlocked state".

First off: this is a nice question!

Let's first explain what "deletion" means, so you can judge whether it applies to your situation. Second, it is explained why using the mutex of a deleted node results in undefined and thus incorrect behaviour (and likely disaster), even if that mutex is used at both deletion and matching in your example implementation of find_match(l,n). Third, some solutions are explored, though you might not like the answers.

Deletion

It is assumed that deleting an object also deallocates its memory. In C, it is expected that this happens via malloc() and free() or similar.

The case is explicitly ignored where a custom (de)allocator is used or where all nodes are allocated before all node processing, deallocated after all node processing, and never reused_. Where "reused" means means that a note is assigned to node->next or list->head after it is "deleted". In this "static" case, do unlock the node's mutex after deleting though, or all threads in pthread_mutex_lock(deleted_node) will patiently wait for a long time...

Why the undefined behaviour?

Deallocation of memory, for example by calling free(node), must be considered as destroying that memory: values written to it before destroying must be considered lost, while values read from it after destroying should be considered undefined or "random". The C language and runtime simply do not offer any guarantees that information can be conveyed across the destruction of memory, for example because memory is reused or the allocation implementation writes values to it for bookkeeping.

Therefore, any mechanism that depends on conveying information across deallocation of memory suffers from undefined behaviour and certainly is incorrect.

As a result, a mechanism that depends on a mutex inside an object that is possibly deleted / deallocated, also suffers from undefined and incorrect behaviour. As the behaviour is undefined, it does not matter if the party that deallocated the node used the mutex, locked or unlocked. The behaviour would simply be differently undefined, please check out "Locked-delete and magic values" for more elaboration on this.

Avoiding the situation

I do not exactly know what you were "asked to do". What is fixed and what are you allowed to change? Depending on that, solutions can range from "none" via "ugly" to "splendid" solutions.

Your current implementation tries to guard against the deletion of node n by using n->mutex; a part of n. As explained above, this leads to undefined behaviour and likely disaster.

To guard guard against deletion of node n, you must use something outside of the node that outlives it, let's define it as the owner, owner(n). In your example, owner(n) is either the list l_n if l_n->head is n, or a node p in l_n where p->next is p. Both l_n and p have a mutex, so locking that mutex before accessing n can work. Naturally, the function that deletes n, should lock that same mutex.

However, if you are not suppposed to change the function signature of find_match(l,n) and are also not supposed to change the data structure of the nodes and the list, there is no way you can know owner(n). After all, n has no link to its owner, nor is the owner a function argument. In this case, the only correct or safe way to solve the problem would be to have a kind of global lock - not part of a node or list - that guards both deleting and each matching of n with another node. Hm.

If you are allowed to change the signature to find_match(l,n,o), where o is the owner, this seems like a solution. However, unless o is the list that owns n itself, the same ownership problem occurs recursively: you cannot trust the mutex of a node p that has p->next == n, unless you lock the mutex of owner(p). Which you do not have. That means you should also have owner(p) as a parameter. In other words: it won't solve your problem.

Another way is to use find_match(l,n,list_that_contains_n) and always lock on the complete list when either traversing a list, matching an item from a list or deleting/adding an element (lock on both l and list_that_contains_n, of course). That way you won't need the mutexes from the nodes at all. This speeds up the functions themselves that traverse, delete and add, but reduces concurrent access to them. It depends on the domain and use-case whether that is better or worse. Locking per element, starting in list_that_contains_n->head, can work, but chances are that to do this correctly, you would have to lock all elements in list_that_contains_n that precede n. The reasons are similar to that of the recursive owner argument mentioned earlier. It can be that there is a solution that does not need to lock all these elements, but I guess it is probably not very readable. If someone can give a readable, correct, proven and efficient example without locking all elements that precede n: that would be nice!

If you can change the data structure of nodes and lists, you can add a back-reference to the owner, which is interesting since the owner can be both a list or a node. Better restructure the problem to using a full doubly-linked-list and find a correct implementation for that.

Conclusion

As you can see, you posed an interesting problem. If you can give a little more information on what you were asked to do, I think somebody - perhaps me - can give you a more precise or suitable answer!

Locked-delete and magic values

But what if I lock the node before deleting it?

Assume that a thread that deletes and deallocates a node, first locks the node's mutex.

If you unlock the node after deallocation, pthread_mutex_unlock(&node->lock) accesses forfeit memory, likely corrupting other parts of your program. Pray the memory is not reused.

If you do not unlock the node, several things can happen. If a thread that executes find_match(l,node) happens to be in pthread_mutex_lock(&node->lock), it will wait forever if the memory is not reused. If the memory is reused, it is undefined what happens if the thread wakes up, as the mutex is likely overwritten by something else. Pray that pthread_mutex_lock(&node->lock) just returns an error code...

But what if I place a magic value in the node lock?

There is no guarantee in the C language that just assigning a value to node->lock will ever be seen by another thread. For arguments' sake, let's assume there is such a guarantee on your platform and architecture.

The memory is still potentially reused, so it is still undefined what will happen.

Another question is: what magic value will you put in the node lock? The implementation of a mutex is platform-specific. And POSIX threads only defines a value of an INITIALIZED mutex, not for a mutex that should not be used. You could try pthread_mutex_destroy(node->lock) but, again, this helps nothing against reused memory.

CodePudding user response：

According to the doc of pthread_mutex_lock, it would return EINVAL if "The value specified by mutex does not refer to an initialized mutex object". So presumably you could nullify the lock field before freeing the node. Something like this:

n->lock = NULL;
free(n); // Or whatever you mean by destroying the node

And then when you use mutex_lock you can check its return value (which is a good practice anyway).