Home > front end >  OpenMP: multiple reductions in parallel
OpenMP: multiple reductions in parallel

Time:09-29

I have some code that looks like this:

double r1 = 0.0, r2 = 0.0;

for (size_t i = 0; i < k;   i) {
    r1  = reduction1(data1[i]);
}

for (size_t i = 0; i < k;   i) {
    r2  = reduction2(data2[i]);
}

The two reductions, while they run over the same number of iterations, otherwise run on different paths and on different sets of data. I was wondering whether there is a way to run both reductions in parallel.

Bonus: what if the two loops ran for different number of iterations?

Edit: in my case k is fairly small, and most of the work is done inside the individual reduction functions. So my goal is to parallelize as many executions of reduction functions as possible.

CodePudding user response:

If you can use one loop it is very easy:

#pragma omp parallel for reduction( :r1,r2)
for (size_t i = 0; i < k;   i) {
    r1  = reduction1(data1[i]);
    r2  = reduction2(data2[i]);
}

if not, use tasks/taskloop:

#pragma omp parallel
#pragma omp single
{
    #pragma omp taskloop reduction( :r1)
    for (size_t i = 0; i < k;   i) {
        r1  = reduction1(data1[i]);
    }

    #pragma omp taskloop reduction( :r2)
    for (size_t i = 0; i < k;   i) {
        r2  = reduction2(data1[i]);
    }

}

EDIT: You mentioned in the comments that k is small, so probably the best is to use a separate task for each reduction1/reduction2 calculation. Also @JeromeRichard pointed out that in the case of using taskloop the second loop will wait for all the previous tasks of the first loop to be completed. So, based on these new information, a better alternative may be something like this:

#pragma omp parallel
#pragma omp single
{
#pragma omp taskgroup task_reduction( :r1,r2)
{
    for (size_t i = 0; i < k;   i) {
        #pragma omp task in_reduction( :r1)
        {
            r1  = reduction1(data1[i]);
        }
    }

    for (size_t i = 0; i < k;   i) {
        #pragma omp task in_reduction( :r2)
        {
            r2  = reduction2(data2[i]);
        }   
    }
}

}

You also mentioned in the comments that the parallel code is only 25% faster than the serial one, which suggests a major problem, which should be investigated by profiling your program.

CodePudding user response:

What about:

double r1 = 0.0, r2 = 0.0;

#pragma omp parallel for reduction( :r1,r2)
for (size_t i = 0; i < k1 k2;   i) {
    if (i < k1) {
        r1  = reduction1(data1[i]);
    } else {
        r2  = reduction2(data2[i-k1]);
    }
}

The two original loops don't even need to have the same number of iterations. If the work inside the reduction?() routines is unbalanced between iterations, a schedule(dynamic,1) clause may also help.

EDIT. Or (but not sure for this one):

double r1 = 0.0, r2 = 0.0;

#pragma omp parallel reduction( :r1,r2)
{
    #pragma omp for nowait schedule(dynamic,1)
    for (size_t i = 0; i < k1;   i) {
        r1  = reduction1(data1[i]);
    }
    #pragma omp for schedule(dynamic,1)
    for (size_t i = 0; i < k2;   i) {
        r2  = reduction2(data2[i]);
    }
}
  • Related