I'm trying to parallelize a nested for loop in OpenMP (C ) that sort of looks like this:
for(i = 0 ; i < a.size() ; i ){
for(j = 0 ; j < a.size() ; j ){
if(i!=j)
a[i].update(a[j]);
}
}
Where the whole jist is that the value of a[i] gets updated by the value of a[j]. The problem that I see here is that there's a dependency in which the update() method might use an old value of a[i], before it gets updated. I have a few ideas in mind involving collapse, shared and private variables, albeit I cannot test them as the server that I need to run this on is currently down, meaning I can't test my theories, so I would appreciate a nudge in the right direction -- What would be the correct pragma clauses that would allow me to execute this in parallel, efficiently?
My thoughts were to maybe keep i private and have a shared j so that the values that do get changed don't depend on one another, although it feels like that would create another dependency in which j might be equal to another i.
UPDATE 1:
Is #pragma omp critical
what I'm looking for?
UPDATE 2:
Upon further analysis, I have realized that the attribute that gets updated is not relevant to the entire operation, so there is no race between what the current value of a[j] is. Nevertheless, I still can't figure out how to parallelize this as update is a void method that depends on the previous value of a[i] (something like a[i] = f(a[j]);
. Return type can't be changed, so atomic struct won't work since there is no explicit operation, neither will a reduction struct, whereas critical just lets it go in serial mode. Any other suggestions?
CodePudding user response:
In each i
iteration, a[i]
gets updated both with a[j]
for j<i
and j>i
. The second category poses no problem, so let's completely ignore that. You could make a copy of a
and read those elements from that copy. Your problem is with j<i
because then you update with elements that themselves have been updated. In effect, a[i]
depends on a[i-1]
and lower indices. You have a dependency, and no critical/atomic will solve that.
So the i
loop needs to be serial. Depending on the structure of your update
function it may be possible to compute the updates for all j<i
in parallel with some reduction, and then apply that to a[i]
. But if the update function is complicated, even that may not be possible: in effect you'd have quantities a[i,j]
that depend on a[i,j-1]
and the whole thing is serial.
CodePudding user response:
Using #pragma omp critical
on the very core of your parallel loop will make your code to be serialized.
Also, there is a clear data race condition on your code: a[i]
can be read and write by different threads at the same time
If memory is not a problem, the easiest way to parallelize it is by creating a copy of a
data and passing it as a constant input to your algorithm