Home > Mobile >  Distribution of loop iterations between threads with a specific order
Distribution of loop iterations between threads with a specific order

Time:12-30

I have this serie code:

for (i=0; i<N; i  ) 
{
    printf ("\n% i = d\n", i);
    C[i] = 0;
    for (j=0; j<N; j  ) C[i]  = MAT[i][j] * B[j];
    x  = C[i];
}

And I want to make the parallel version that is an easy task with only a pragma parallel for but the difficult think is the specification of distribute the execution order of iterations like:

 i = 0 
 i = n
 i = 1
 i = n-1
 //The rest of iterations

I can make a parallel version if I would know the number of threads used but it has to print that order with any even number of threads I know I have to use the omp_get_num_threads to track this but Im not able to make it, thanks

CodePudding user response:

The problem I see is that if you want to distribute the iterations like this: 0123...3210 (The number is the thread and the position is the iteration) You have to modify the loop, as I see you are not able to make that if you don't track the N-i-1 iteration so the code would be like this:

#pragma omp parallel for private (i, j, k) reduction( :x)       
for (i=0; i<N/2; i  )
{
    k = N-1-i;
    printf ("\n i = %d ", i);
    printf ("\n k = %d\n", k);
    C[i] = 0;
    C[k] = 0;
    for (j=0; j<N; j  )
    {
        C[i]  = MAT[i][j] * B[j];
        C[k]  = MAT[k][j] * B[j];
    }
    x  = C[i];
    x  = C[k];
} 

So the same thread is going to do 0 and N, the next thread 1 and N-1... You could even distribute the iterations in more executions per loop but remember that the number of distributions should be <=N

If you want to conserve the order: 0 N, 1 N-1... you have to use the clause ordered and the ordered block but this has no sense for the parallelization because different threads execute concurrently until they encounter the ordered region and then they execute this part sequentially in the same order as it would get executed in the serial version but add the overload time of synchronization between threadds and you will end with a slower version of the serial one.

    #pragma omp parallel for ordered private (i, j, k) reduction( :x)
    for (i=0; i<N/2; i  )
    {
            #pragma omp ordered
            k = N-1-i;
            printf ("\n i = %d ", i);
            printf ("\n k = %d\n", k);
            C[i] = 0;
            C[k] = 0;

            for (j=0; j<N; j  )
            {
                    C[i]  = MAT[i][j] * B[j];
                    C[k]  = MAT[k][j] * B[j];
            }
            x  = C[i];
            x  = C[k];
   }
  • Related