Home > Back-end >  Fastest communication between threads in C
Fastest communication between threads in C

Time:07-14

This question is similar to this one about threads, but different.

I have a program that splits a time consuming calculation into different jobs that can be run in parallel. Specifically there is step A and then step B, which repeat multiple times. So the program runs ABABAB... Etc.

So I have one thread that controls things and its job is to tell N other threads to start step A and then to start step B. It is important to only start the next step after all threads of the previous step have finished.

So pseudo code might be something like

 for (n=0;n<N;n  )
   Set variable in common memory to 1 for each thread to start step A;
 do {
  sum = 0; add variable for each thread to sum in a loop
   } while (sum!=0)
/* Threads set their variable to 0 when their calculations are complete*/
... < Similar code to start and monitor step B>

So my question is... ... Is there a better way to send messages between threads (I guess the answer is yes)... Are any of the better ways faster or at least as fast? (I guess the answer may be no.. but really interested to hear any thoughts about this)

Oh and should have said the aim is to have each process locked onto a single core for speed.

Edit

Many thanks for all the comments and the suggestion to look at C pthread synchronize function - that question is very relevant - it does not, however, answer the question about speed. - In the answer of @Willis he makes the point about all the threads spinning constantly to check the status of variables in common memory... and comments mention the potential issues of race conditions and difficulty of debugging....

.... Despite the potential drawbacks of the method I have tried to outline in the pseudo code I am interested to know if the suggested 'barrier' method in pthread.h will be as fast, faster or slower than the method in the pseudo code.

Maybe I could have been clearer in the original question above... that the steps A and B will need to be repeated 1e6 to 1e9 times - or as many times as possible in reasonable time and so I am very wary of any potential hold up in synchronising the threads.... Does this make sense?

CodePudding user response:

Exact synchronization tools available to you are going to depend almost entirely on the version (and/or flavor) of C you're using and your OS, so whether or not your compiler supports C11 <threads.h>, whether you're using a POSIX based system, which would give you access to <pthread.h> and <semaphore.h> etc. The overall behavior of the synchronization tools won't vary much from system to system, but exact initialization and call syntax will.

The problem with trying to solve this problem how your pseudo code suggests, (without a synchronization library,) is you're going to end up with the control thread spinning in a loop checking every single worker thread's progress until they're all complete, while every worker that has finished the current task is going to be spinning in a loop checking over and over again if the control thread has given it the go-ahead to continue on, which is a huge waste of CPU resources, and can be a nightmare to debug for complicated race-condition and atomic-access reasons.

The proper way to synchronize threads in an efficient way is to let the operating system do the scheduling. This means you're looking for a way to tell each thread to go to sleep when it reaches the point between steps A and B, then have the OS wake them all up when all N worker threads have finished A, so they can do B then all go to sleep again, and so on.

This is an example of a synchronized threading system using C11 <threads.h> that creates N threads that each call doA(), then sleep until the last one finishes, then each call doB() and synchronize again then start over. This is effectively a manual implementation of a pthread barrier since they aren't available in the <threads.h> library.

#define N 20 //Or however many threads you need
#define NUM_LOOPS 4 //However many times AB must be done

//One instance of this struct will sit on the stack of main()
//and every worker thread will have a reference to it
struct worker{
  volatile int count; //Number of threads finished with current part of the job
  mtx_t mutex;        //Make accessing "count" thread-safe
  cnd_t condition;    //Synchronize wakeup and sleep
};


int worker_thread(void *pt){
  struct worker *shared = pt; //pt is actually a pointer to synchronization data in main()

  for(int c = 0;c < NUM_LOOPS;   c){

    doA();
  
    mtx_lock(&shared->mutex);
    if(  shared->count != N){ //If I am not the last to finish A
      cnd_wait(&shared->condition, &shared->mutex); //Unlock mutex and go to sleep
      mtx_unlock(&shared->mutex);  //cnd_wait() automatically re-acquires mutex here
    }
    else{
      shared->count = 0;                  //Reset count
      mtx_unlock(&shared->mutex);         //Unlock the mutex
      cnd_broadcast(&shared->condition);  //Wake up the sleeping worker threads
    }
  
    doB();

    mtx_lock(&shared->mutex);
    if(  shared->count != N){ //If I am not the last to finish B
      cnd_wait(&shared->condition, &shared->mutex); //Unlock mutex and go to sleep
      mtx_unlock(&shared->mutex);  //cnd_wait() automatically re-acquires mutex here
    }
    else{
      shared->count = 0;                  //Reset count
      mtx_unlock(&shared->mutex);         //Unlock the mutex
      cnd_broadcast(&shared->condition);  //Wake up the sleeping worker threads
    }

  }
  return 0;
}

int main(){
  thrd_t threads[N];
  struct worker shared;
  shared.count = 0;
  mtx_init(&shared.mutex, mtx_plain); //To allow for thread-safe access to count
  cnd_init(&shared.condition);        //To synchronize sleep and wake-up

  //Create all N threads
  for(int c = 0;c < N;   c)
    thrd_create(&threads[c], worker_thread, &shared); //Pass reference to synchronization data

  //Reap all N threads, sleeping while waiting for them to terminate
  for(int c = 0;c < N;   c)
    thrd_join(threads[c],NULL);

  //Cleanup
  mtx_destroy(&shared.mutex);
  cnd_destroy(&shared.condition);
}

CodePudding user response:

So my question is... ... Is there a better way to send messages between threads (I guess the answer is yes)...

The Swiss army knife of thread synchronization is the condition variable, which works in conjunction with a mutex and some shared data (such as your flag variables). You could build a reasonable solution with that, and that would probably be the right choice if indeed you want to have a separate manager thread controlling when the others step from phase to phase.

For your particular problem, however, it sounds like a barrier -- or probably two -- would afford a simpler solution. That could also remove the need for a separate manager thread.

Are any of the better ways faster or at least as fast? (I guess the answer may be no.. but really interested to hear any thoughts about this)

[...]

.... Despite the potential drawbacks of the method I have tried to outline in the pseudo code I am interested to know if the suggested 'barrier' method in pthread.h will be as fast, faster or slower than the method in the pseudo code.

Inasmuch as the pseudocode appears to demonstrate two or more threads accessing a shared variable without synchronization, with some of the accesses being writes, the speed question is moot. The behavior of a program in which such a pattern of activity occurs is undefined.

In practice, that might very well manifest as one thread not observing some or all of the other's writes in a timely manner, or perhaps ever. Naturally, that would be devastating to your scheme. There is no point in debating the performance of code that is semantically wrong.

It's unclear whether it makes sense to worry about the IPC speed. If your steps are so fine-grained that the synchronization costs constitute a significant fraction of the overall time, then you need to make each thread do more work between needing to synchronize. If not, then you are wasting your time by worrying about it.

  • Related