Home > Software engineering >  How to best parallelize nested for loop that calls a variable update function?
How to best parallelize nested for loop that calls a variable update function?

Time:12-14

I'm trying to parallelize a nested for loop in OpenMP (C ) that sort of looks like this:

for(i = 0 ; i < a.size() ; i  ){
   for(j = 0 ; j < a.size() ; j  ){
      if(i!=j)
         a[i].update(a[j]);
   }
}

Where the whole jist is that the value of a[i] gets updated by the value of a[j]. The problem that I see here is that there's a dependency in which the update() method might use an old value of a[i], before it gets updated. I have a few ideas in mind involving collapse, shared and private variables, albeit I cannot test them as the server that I need to run this on is currently down, meaning I can't test my theories, so I would appreciate a nudge in the right direction -- What would be the correct pragma clauses that would allow me to execute this in parallel, efficiently?

My thoughts were to maybe keep i private and have a shared j so that the values that do get changed don't depend on one another, although it feels like that would create another dependency in which j might be equal to another i.

UPDATE 1: Is #pragma omp critical what I'm looking for?

UPDATE 2: Upon further analysis, I have realized that the attribute that gets updated is not relevant to the entire operation, so there is no race between what the current value of a[j] is. Nevertheless, I still can't figure out how to parallelize this as update is a void method that depends on the previous value of a[i] (something like a[i] = f(a[j]);. Return type can't be changed, so atomic struct won't work since there is no explicit operation, neither will a reduction struct, whereas critical just lets it go in serial mode. Any other suggestions?

CodePudding user response:

In each i iteration, a[i] gets updated both with a[j] for j<i and j>i. The second category poses no problem, so let's completely ignore that. You could make a copy of a and read those elements from that copy. Your problem is with j<i because then you update with elements that themselves have been updated. In effect, a[i] depends on a[i-1] and lower indices. You have a dependency, and no critical/atomic will solve that.

So the i loop needs to be serial. Depending on the structure of your update function it may be possible to compute the updates for all j<i in parallel with some reduction, and then apply that to a[i]. But if the update function is complicated, even that may not be possible: in effect you'd have quantities a[i,j] that depend on a[i,j-1] and the whole thing is serial.

CodePudding user response:

Using #pragma omp critical on the very core of your parallel loop will make your code to be serialized.

Also, there is a clear data race condition on your code: a[i] can be read and write by different threads at the same time

If memory is not a problem, the easiest way to parallelize it is by creating a copy of a data and passing it as a constant input to your algorithm

  • Related