I have a multithreaded application in C, that does some computations on a matrix. I use barriers to synchronise work. I was getting a bunch of weird errors and non-deterministic behaviour, and I realised I forgot to check the return values of pthread_barrier_wait().
Here I declare some barriers globally:
pthread_barrier_t passa,passb,check;
I have a main function which does some initialization and then spawns workers:
double **compute (int p, double P, int n, double **a){
int r1 = pthread_barrier_init(&passa,NULL,p);
int r2 = pthread_barrier_init(&passb,NULL,p);
int r3 = pthread_barrier_init(&check,NULL,p);
if(!(r1==r2==r3==0)){printf("barrier init failed\n"); exit(1);}
pthread_t *threads = malloc(sizeof(pthread_t)*p);
//some admin stuff
//spawning threads in while loop
int err = pthread_create(&threads[i],NULL,&compute0,args);
if(err){
printf("Thread Creation Error, exiting..\n");
exit(1);
}
else{ //etc
Then I have the worker thread function compute0():
void *compute0(void *argsv){
//stuff
while(1){
b = pthread_barrier_wait(&check);
if(b != PTHREAD_BARRIER_SERIAL_THREAD|| b!= 0){
printf("b : %d\n",b);
printf("barrier failed\n"); exit(1);
}
//some checks
b = pthread_barrier_wait(&passa);
if(b != PTHREAD_BARRIER_SERIAL_THREAD|| b!= 0){
printf("barrier failed\n"); exit(1);
}
//First pass
// work
b = pthread_barrier_wait(&passb);
if(b != PTHREAD_BARRIER_SERIAL_THREAD || b!= 0){
printf("barrier failed\n"); exit(1);
}
//second pass
// more work
}
}
}
Now I never noticed this before, but the barrier waits are actually failing. I never checked the return value of this previously :
note: one thread is used for control, computations will be run on 2 threads
Thread Created with ID : 139740189513280
Thread Created with ID : 139740181120576
================================================================
b : -1
barrier failed
b : b : 0
make: *** [Makefile:3: all] Error 1
What could be causing this?
CodePudding user response:
You have the following:
b != PTHREAD_BARRIER_SERIAL_THREAD || b != 0
It should be either of the following:
!( b == PTHREAD_BARRIER_SERIAL_THREAD || b == 0 )
b != PTHREAD_BARRIER_SERIAL_THREAD && b != 0
So you probably aren't actually getting an error. But let's assume an error is actually reported despite the above error. This only lists EINVAL
(indicating a bad argument) as a possible error. I wouldn't count on that being a complete list, though. You can verify by using the following:
errno = b;
perror("pthread_barrier_wait");
exit(1);