In the OpenMP Specification, the following restriction is posed for a barrier construct: (see p. 259, lines 30-31):
Each barrier region must be encountered by all threads in a team or by none at all, unless cancellation has been requested for the innermost enclosing parallel region.
Just for completeness, the definition of a region by OpenMP specification is as follows (cf. p.5, lines 9 ff.):
region
All code encountered during a specific instance of the execution of a given construct, structured block sequence or OpenMP library routine. A region includes any code in called routines as well as any implementation code. [...]
I came up with a very simple example and I am asking myself whether it is at all valid, because the barriers are placed inside if-conditions (and not every barrier is "seen" by each thread). Nevertheless, the number of barriers is identical for each thread and experiments with two compilers show that the code works as expected.
#include <stdio.h>
#include <unistd.h>
#include <stdarg.h>
#include <sys/time.h>
#include "omp.h"
double zerotime;
double gettime(void) {
struct timeval t;
gettimeofday(&t, NULL);
return t.tv_sec t.tv_usec * 1e-6;
}
void print(const char *format, ...) {
va_list args;
va_start (args, format);
#pragma omp critical
{
fprintf(stdout, "Time = %1.1lfs ", gettime() - zerotime);
vfprintf (stdout, format, args);
}
va_end (args);
}
void barrier_test_1(void) {
for (int i = 0; i < 5; i ) {
if (omp_get_thread_num() % 2 == 0) {
print("Path A: Thread %d waiting\n", omp_get_thread_num());
#pragma omp barrier
} else {
print("Path B: Thread %d waiting\n", omp_get_thread_num());
sleep(1);
#pragma omp barrier
}
}
}
int main() {
zerotime = gettime();
#pragma omp parallel
{
barrier_test_1();
}
return 0;
}
For four threads I get the following output:
Time = 0.0s Path B: Thread 1 waiting
Time = 0.0s Path B: Thread 3 waiting
Time = 0.0s Path A: Thread 0 waiting
Time = 0.0s Path A: Thread 2 waiting
Time = 1.0s Path B: Thread 1 waiting
Time = 1.0s Path B: Thread 3 waiting
Time = 1.0s Path A: Thread 2 waiting
Time = 1.0s Path A: Thread 0 waiting
Time = 2.0s Path B: Thread 1 waiting
Time = 2.0s Path B: Thread 3 waiting
Time = 2.0s Path A: Thread 0 waiting
Time = 2.0s Path A: Thread 2 waiting
...
which shows that all the threads nicely wait for the slow Path B operation and pair up even though they are not placed in the same branch.
However, I am still confused from the specification, whether my code is at all valid.
Contrast this e.g. with CUDA where the following statement is given regarding the related __syncthreads()
routine:
__syncthreads() is allowed in conditional code but only if the conditional evaluates identically across the entire thread block, otherwise the code execution is likely to hang or produce unintended side effects.
Thus, in CUDA, such code as written above in terms of __syncthreads()
would be invalid, because the condition omp_get_thread_num() % 2 == 0
evaluates differently depending on the thread.
CodePudding user response:
In the OpenMP Specification, the following restriction is posed for a barrier construct: (see p. 259, lines 30-31):
Each barrier region must be encountered by all threads in a team or by none at all, unless cancellation has been requested for the innermost enclosing parallel region.
That description is a bit problematic because barrier
is a stand-alone directive. That means it has no associated code other than the directive itself, and therefore there is no such thing as a "barrier region".
Nevertheless, I think the intent is clear, both from the wording itself and from the conventional behavior of barrier implementations: absent any cancellation, if any thread in a team executing the innermost parallel region containing a given barrier
construct reaches that barrier, then all threads in the team must reach that same barrier
construct. Different barrier constructs represent different barriers, each requiring all threads to arrive before any proceed past.
However, I am still confused from the specification, whether my code is at all valid.
I see that the behavior of your test code suggests that the two barriers are being treated as a single one. This is irrelevant to interpreting the specification, however, because your code indeed does not satisfy the requirement you asked about. The spec does not require the program to fail in any particular way in this case, but it certainly does not require the behavior you observe, either. You might well find that the program behaves differently with a different version of the compiler or a different OpenMP implementation. The compiler is entitled to assume that your OpenMP code conforms to the OpenMP spec.
Of course, in the case of your particular example, the solution is to replace the two barrier
constructs in the different conditional branches with a single one immediately following the else
block.