I'd like to run something like the following:
for (int index = 0; index < num; index )
I'd want to run the for loop with four threads, with the threads executing in the order: 0,1,2,3,4,5,6,7,8, etc...
That is, for the threads to be working on index =n,(n 1),(n 2),(n 3)
(in any particular ordering but always in this pattern), I want iterations of index = 0,1,2,...(n-1)
to already be finished.
Is there a way to do this? Ordered doesn't really work here as making the body an ordered section would basically remove all parallelism for me, and scheduling doesn't seem to work because I don't want a thread to be working on threads k->k index/4.
Thanks for any help!
CodePudding user response:
I'm not sure I understand your request correctly. If I try to summarize how I interpret it, that would be something like: "I want 4 threads sharing the iterations of a loop, with always the 4 threads running at most on 4 consecutive iterations of the loop".
If that's what you want, what about something like this:
int nths = 4;
#pragma omp parallel num_thread( nths )
for( int index_outer = 0; index_outer < num; index_outer = nths ) {
int end = min( index_outer nths, num );
#pragma omp for
for( int index = index_outer; index < end; index ) {
// the loop body just as before
} // there's a thread synchronization here
}
CodePudding user response:
You can do this with, not a parallel for loop, but a parallel region that manages its own loop inside, plus a barrier to make sure all running threads have hit the same point in it before being able to continue. Example:
#include <stdatomic.h>
#include <stdio.h>
#include <omp.h>
int main()
{
atomic_int chunk = 0;
int num = 12;
int nthreads = 4;
omp_set_num_threads(nthreads);
#pragma omp parallel shared(chunk, num, nthreads)
{
for (int index; (index = atomic_fetch_add(&chunk, 1)) < num; ) {
printf("In index %d\n", index);
fflush(stdout);
#pragma omp barrier
// For illustrative purposes only; not needed in real code
#pragma omp single
{
puts("After barrier");
fflush(stdout);
}
}
}
puts("Done");
return 0;
}
One possible output:
$ gcc -std=c11 -O -fopenmp -Wall -Wextra demo.c
$ ./a.out
In index 2
In index 3
In index 1
In index 0
After barrier
In index 4
In index 6
In index 5
In index 7
After barrier
In index 10
In index 9
In index 8
In index 11
After barrier
Done