I post the current version of my code, which is this one
# pragma omp parallel
{
...
...
...
# pragma omp single nowait
for (int i = 0; i < M; i ) {
centroids[points[i].cluster].points_in_cluster ;
}
for (int i = 0; i < M; i ) { //I want thread_count - 1 to be working here
# pragma omp for
for (int coord = 0; coord < N; coord ){
//int my_tid = omp_get_thread_num();
//printf("my tid:%d my_coord: %d my i:%d\n ", my_tid, coord, i);
centroids[points[i].cluster].accumulator.coordinates[coord] = points[i].coordinates[coord];
}
}
# pragma omp barrier
...
...
...
}
and works fine already, but I want to see if times can be improved by doing the following, make one thread do what is under the omp single
pragma, and the other do what is underneath, without his help. So if there are 8 threads, 1 will do the single
section, and 7 the other part.
I tried with omp sections
but it didn't work, because it said that work-sharing region may not be closely nested inside of work-sharing
.
CodePudding user response:
You can use tasks to solve your problem. In this case one thead will run the first loop, all other threads the second loop.
#pragma omp parallel
#pragma omp single
{
#pragma omp task
{
// one task (thread) runs this part of the code
}
#pragma omp taskloop num_tasks(omp_get_num_threads()-1)
for (....){
// all other tasks (threads) run this loop
}
}
By the way, I don't think this code would run faster compared to the other approach, since the overhead of using tasks is higher.