I am trying to do distributed search using omp.h. I am creating 4 threads. Thread with id 0 does not perform the search instead it overseas which thread has found the number in array. Below is my code:
int arr[15]; //This array is randomly populated
int process=0,i=0,size=15; bool found=false;
#pragma omp parallel num_threads(4)
{
int thread_id = omp_get_thread_num();
#pragma omp cancellation point parallel
if(thread_id==0){
while(found==false){ continue; }
if(found==true){
cout<<"Number found by thread: "<<process<<endl;
#pragma omp cancel parallel
}
}
else{
#pragma omp parallel for schedule(static,5)
for(i=0;i<size;i ){
if(arr[i]==number){ //number is a int variable and its value is taken from user
found = true;
process = thread_id;
}
cout<<i<<endl;
}
}
}
The problem i am having is that each thread is executing for loop from i=0 till i=14. According to my understanding omp divides the iteration of the loops but this is not happening here. Can anyone tell me why and its possible solution?
CodePudding user response:
Your problem is that you have a parallel
inside a parallel
. That means that each thread from the first parallel region makes a new team. That is called nested parallelism and it is allowed, but by default it's turned off. So each thread creates a team of 1 thread, which then executes its part of the for loop, which is the whole loop.
So your omp parallel for
should be omp for
.
But now there is another problem: your loop is going to be distributed over all threads, except that thread zero never gets to the loop. So you get deadlock.
.... and the actual solution to your problem is a lot more complicated. It involves creating two tasks, one that spins on the shared variable, and one that does the parallel search.
#pragma omp parallel
{
# pragma omp single
{
int p = omp_get_num_threads();
int found = 0;
# pragma omp taskgroup
{
/*
* Task 1 listens to the shared variable
*/
# pragma omp task shared(found)
{
while (!found) {
if (omp_get_thread_num()<0) printf("spin\n");
continue; }
printf("found!\n");
# pragma omp cancel taskgroup
} // end 1st task
/*
* Task 2 does something in parallel,
* sets `found' to true if found
*/
# pragma omp task shared(found)
{
# pragma omp parallel num_threads(p-1)
# pragma omp for
for (int i=0; i<p; i )
// silly test
if (omp_get_thread_num()==2) {
printf("two!\n");
found = 1;
}
} // end 2nd task
} // end taskgroup
}
}
Bonus solution:
#pragma omp parallel num_threads(4)
{
if(omp_get_thread_num()==0){ spin_on_found; }
if(omp_get_thread_num()!=0){
#pragma omp for nowait schedule(dynamic)
for ( loop ) stuff
So the combination of dynamic
and nowait
can somehow deal with the missing thread.
CodePudding user response:
@Victor Eijkhout already explained what happened here, I just want to show you a simpler (and data race free) solution.
Note that OpenMP has a significant overhead, in your case the overheads are bigger than the gain by parallelization. So, the best idea is not to use parallelization in this case.
If you do some expensive work inside the loop, the simplest solution is to skip this expensive work if it is not necessary. Note that I have used
#pragma omp critical
beforefound = true;
to avoid data race.
#pragma omp parallel for
for(int i=0; i<size;i ){
if(found) continue;
// some expensive work here
if(CONDITION){
#pragma omp critical
found = true;
}
}
- Another alternative is to use
#pragma omp cancel for
#pragma omp parallel
#pragma omp for
for(int i=0; i<size;i ){
#pragma omp cancellation point for
// some expensive work here
if(CONDITION){
//cancelling the for loop
#pragma omp cancel for
}
}