Parallelize nested for loop with a small dependency-CodePudding

I'm trying to parallelize a nested for loop in OpenMP (C ) that sort of looks like this:

for(i = 0 ; i < a.size() ; i  ){
   for(j = 0 ; j < a.size() ; j  ){
      if(i!=j)
         a[i].update(a[j]);
   }
}

Where the whole jist is that the value of a[i] gets updated by the value of a[j]. The problem that I see here is that there's a dependency in which the update() method might use an old value of a[i], before it gets updated. I have a few ideas in mind involving collapse, shared and private variables, albeit I cannot test them as the server that I need to run this on is currently down, meaning I can't test my theories, so I would appreciate a nudge in the right direction -- What would be the correct pragma clauses that would allow me to execute this in parallel, efficiently?

My thoughts were to maybe keep i private and have a shared j so that the values that do get changed don't depend on one another, although it feels like that would create another dependency in which j might be equal to another i.

UPDATE 1: Is #pragma omp critical what I'm looking for?

CodePudding user response：

Using #pragma omp critical on the very core of your parallel loop will make your code to be serialized.

Also, there is a clear data race condition on your code: a[i] can be read and write by different threads at the same time

If memory is not a problem, the easiest way to parallelize it is by creating a copy of a data and passing it as a constant input to your algorithm

CodePudding user response：

In each i iteration, a[i] gets updated both with a[j] for j<i and j>i. The second category poses no problem, so let's completely ignore that. You could make a copy of a and read those elements from that copy. Your problem is with j<i because then you update with elements that themselves have been updated. In effect, a[i] depends on a[i-1] and lower indices. You have a dependency, and no critical/atomic will solve that.

So the i loop needs to be serial. Depending on the structure of your update function it may be possible to compute the updates for all j<i in parallel with some reduction, and then apply that to a[i]. But if the update function is complicated, even that may not be possible: in effect you'd have quantities a[i,j] that depend on a[i,j-1] and the whole thing is serial.