Home > OS >  pthread_cond_wait never returning with EOWNERDEAD
pthread_cond_wait never returning with EOWNERDEAD

Time:04-17

I tried sharing a mutex and a condition variable between two processes. One process owns the mutex and sets the condition variable while the other waits on the condition variable. My understanding is that the process currently holding the mutex is the "owner". When the owner app exits, a mutex lock on that specific mutex should return a EOWNERDEAD error because the mutex is robust. So far this seems to be working. But if i wait on the condition variable, EOWNERDEAD is never returned and the call is just blocking infinitely.

Creating the mutex and condition variable in the "owner" process:

struct InternalEvent {
    pthread_mutex_t lock;
    pthread_cond_t condSet;
    bool set;
    bool manualReset;
};

pthread_mutexattr_t mutexAttr;
pthread_condattr_t conditionAttr;

int filedescriptor = 0;
filedescriptor = ::open(name, O_RDWR | O_CREAT | O_EXCL, 0666);
if (filedescriptor < 0)
    return false;
ftruncate(filedescriptor, sizeof(InternalEvent));
pthread_mutexattr_init(&mutexAttr);
pthread_mutexattr_setrobust(&mutexAttr, PTHREAD_MUTEX_ROBUST);
pthread_mutexattr_setpshared(&mutexAttr, PTHREAD_PROCESS_SHARED);
pthread_condattr_init(&conditionAttr);
pthread_condattr_setpshared(&conditionAttr, PTHREAD_PROCESS_SHARED);
internalEvent = (InternalEvent*) mmap(NULL, sizeof(InternalEvent), PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0);
::close(filedescriptor);
pthread_mutex_init(&internalEvent->lock, &mutexAttr);
pthread_cond_init(&internalEvent->condSet, &conditionAttr);
internalEvent->set = false;

Opening the mutex and the condition variable in another process:

filedescriptor = ::open(name, O_RDWR, 0666);
if (filedescriptor < 0)
    return false;
internalEvent = (InternalEvent*) mmap(NULL, sizeof(InternalEvent), PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0);
::close(filedescriptor);

Setting the condition variable:

int res = pthread_mutex_lock(&internalEvent->lock);
if(res == EOWNERDEAD) {
    internalEvent->set = false;
    pthread_mutex_consistent(&internalEvent->lock);
}
if(!internalEvent->set)
    pthread_cond_broadcast(&internalEvent->condSet);
internalEvent->set = true;
pthread_mutex_unlock(&internalEvent->lock);

Waiting for the condition variable:

int res = pthread_mutex_lock(&internalEvent->lock);

while (!internalEvent->set && res == 0) {
    res = pthread_cond_wait(&internalEvent->condSet, &internalEvent->lock);
}
if(res == 0 && !internalEvent->manualReset) {
    internalEvent->set = false;
}

pthread_mutex_unlock(&internalEvent->lock);
return res == 0;

My question is, how can i detect the termination/crashing/exiting of the "owner" process in the blocking pthread_cond_wait call? I dont really need to restore the mutexes or the condition variables state. I only want to detect the termination.

Edit: Is there maybe some other way to wait for multiple mutexes in one blocking call? Then i could simply have a mutex per process and one mutex for the condition variable

CodePudding user response:

EOWNERDEAD is a defined return value for pthread_mutex_lock(), not for pthread_cond_wait(). This is perhaps because CVs do not have owners in the same sense that mutexes do. In any case, there is no reason to expect a wait on a CV ever to return EOWNERDEAD.

Moreover, a thread waiting for on CV specifically does not hold the associated mutex for the duration, and will not try to reacquire it until the wait is over. If the CV isn't signaled then there isn't any reason for the thread even to notice that the mutex owner is dead.*

The bottom line is that POSIX does not define robust condition variables in the same sense that it defines robust mutexes, and using a robust mutex with a CV does not impart robustness on the CV. So,

how can i detect the termination/crashing/exiting of the "owner" process in the blocking pthread_cond_wait call?

You can't. Condition variables do not have that capability.

If you're not satisfied with the resulting risk of failure of one process causing a hang in another then you have several options, among them:

  • use threads instead of processes;
  • use pthread_cond_timedwait() instead of pthread_cond_wait(), and abort or attempt recovery if the wait times out;
  • use a second thread in the waiting process to monitor the state of the other cooperating process(es), and to take corrective action when necessary.

*And that is potentially a more fundamental issue here. Is the process that dies while holding the mutex the (only) one that you expect to signal the CV? If so, then you're never going to see that signal in your failure case.

  • Related