Ignore request for parallel computation if a function lower in stack already ignites parallel comput-CodePudding

Is it possible with OpenMP, when a function, lower in stack, ignites multiprocessing, then OpenMP facilites ignore multiprocessing requests from functions' bodies, higher in stack?

Is this the way OpenMP always works? If not, can I do this? How?

void do1()
{
    #pragma omp parallel for
    for (unsigned int i = 0; i < 10;   i);
}
void do2()
{
    #pragma omp parallel for
    for (unsigned int i = 0; i < 10;   i) do1();
}
void do3()
{
    #pragma omp parallel for
    for (unsigned int i = 0; i < 10;   i) do2();
}
int main()
{
    do1(); // runs in parallel
    do2(); // do2() runs in parallel, do1() I want not
    do3(); // do3() runs in parallel, do1() and do2() I want not
}

CodePudding user response：

OpenMP works exactly as you described, if nested parallelism is disabled. On current OpenMP implementations it is disabled by default, but it is not specified by the standard, so to be on the safe side it is worth disabling it by setting OMP_MAX_ACTIVE_LEVELS environmental variable to 1 or by using omp_set_max_active_levels(1); function in your code.

CodePudding user response：

This situation is called nesting and it is generally inefficient (this is dependent of the runtime though). You can tweak OpenMP runtime parameter to mitigate performance issue (see the answer of @Laci), but honestly, this pattern often causes more issues than it solves. The modern solution is simply to use tasks and more specifically taskloop directives in your case. Alternatively, you can use the if(...) clause to tune the parallel section behaviour (eg. disable parallelism regarding a given condition). The scheduling of the task is dependent of the runtime. For example, IOMP (Clang/ICC) is based on a (randomized) work-stealing algorithm while GCC use a bounded centralized queue. The overhead of tasks can be bigger than the one of parallel sections (especially with a static scheduling).

Note that the stack frame can be optimized by compilers. In fact, do1, do2 and do3 can be inlined so that at runtime there is no trace left of theses functions.