Based on:enter link description here
Known: number of processors: 28
Code 1:
void fun1() { printf("Hello, world\n"); } #pragma omp parallel { fun1(); }
Code 2:
void fun2() { #pragma omp for for(int i=0;i<10;i ) { printf("Hello, world\n"); } } #pragma omp parallel { fun2(); }
Code 3:
#pragma omp parallel { #pragma omp for for(int i=0;i<10;i ) { printf("Hello, world\n"); } }
Results:
Code1: printf is executed 28*1=28 times.
Code2 is equivalent to Code3: printf is executed 10 times. WHY?WHY NOT printf is executed 28*10=280 times, with each of the 28 threads responsible for the whole for-loop?
ORIGINAL POST:
Question:
Why
#pragma omp parallel { #pragma omp for for(int i=0;i<N;i ){} }
results in that every iteration of the loop is executed 1 time, and why not
#pragma omp for for(int i=0;i<N;i ){}
(i.e. code within { } above) executed as many times as the numbers of threads(denoted as M) according to the specifications of "#pragma omp parallel", namely every iteration of the loop is respectively executed M times by M threads?
or maybe this kind of nested parallel construct by "for" can't be natively explained by the specifications of "#pragma omp parallel" because of implementations ?
CodePudding user response:
The two basic concepts in OpenMP are 1. the parallel region: if you encounter omp parallel
a team of threads is created, and each thread starts executing the region. And 2. "worksharing constructs", of which omp for
is the most obvious one. If you have a team of threads, the work is distributed over those threads. So in both your codes 2 & 3 you create a team, and then the team encounters the loop and distributes the iterations.
You are wondering why not every thread executes the whole loop? That would happen if you omit the omp for
. In that case the loop is an instruction like any other, and each thread executes it in its entirety.
CodePudding user response:
This code:
#pragma omp for
for(int i=0;i<N;i ){}
is practically sequential code. As per the section Worksharing-Loop Construct in the OpenMP specification, the for
construct needs a parallel
construct that it binds to. The parallel
construct creates the threads that the for
uses to execute in parallel. So, you indeed have to write
#pragma omp parallel // creates the threads
{
#pragma omp for // execute in parallel
for(int i=0;i<N;i ){}
}
You can use the shorter form, too:
#pragma omp parallel for // create threads & execute in parallel
for(int i=0;i<N;i ){}
UPDATE (to reflect the update to the original post):
Code 1 in the original post runs 28 threads in the parallel region, each calling the function, and printing "Hello World".
Code 2 and code 3 spawn 28 threads. Code 2 calls the function and the for
construct distributes 10 loop iterations across 28 threads. Since there are only 10 iterations, only 10 invocations of printf
will happen, and only 10 threads will actively print. The other 18 will do nothing. Same for Code 3.
The link I have provided explains what the for
construct does.