I'm working with OpenMP in my code, but in order to do that I have to solve this dependency:
do q=1,pppp
i=0
DO j=1, pppp
do c1=1,3
vect(c1)=xx(q,c1)-xx(j,c1)
end do
dist=sqrt(vect(1)**2 vect(2)**2 vect(3)**2)
if(dist<0.0001)then
i=i 1
if(i>10)i=10
caravec(q,i)=j
endif
ENDDO
ENDDO
I'm trying to avoid the ordered
clause, because it is expensive, but I don't know how to remove the dependency. How can I do that?
Thanks for all the help
CodePudding user response:
DISCLAIMER: I have never programmed with Fortan. So my code most-likely has bugs.
But using logic and assuming that what you have stated is correct i.e., that the only dependency is the 'i' variable. You can try the following:
do q=1,pppp
DO j=1, pppp
do c1=1,3
vect(c1)=xx(q,c1)-xx(j,c1)
end do
dist(q,j)=sqrt(vect(1)**2 vect(2)**2 vect(3)**2)
ENDDO
ENDDO
So you parallelize the outer loop with OpenMP. This way you can run in parallel the most computational demanding part of the loop without the 'i' dependency. Notice that I am saving the result of the distances into a vector.
Then sequentially you can do the remaining loop computation as follows:
do q=1,pppp
i=0
DO j=1, pppp
if(dist(q,j) < 0.0001)then
i=i 1
if(i>10)i=10
caravec(q,i)=j
endif
ENDDO
ENDDO
The downside of the current approach is that besides requiring refactoring of the code it also uses more memory than the original one. On the other hand it removes the need for synchronization among threads.
As @Ian Bush mentioned in the comments you can further improve your sequential code by
testing n distance squared rather than distance will avoid the call to sqrt
CodePudding user response:
There are a few Fortran things you can do to improve on @dreamcrash' excellent answer:
Fortran is column-major, so you want to iterate over left-most indices in inner loops, not right-most indices. So you should probably transpose
xx
, so it's a3*pppp
array indexed asxx(c1,q)
rather than apppp*3
array indexed asxx(q,c1)
.You probably want to use whole-array operations rather than single-element operations, as these have a better chance of being vectorised.
You can store the result of
dist<0.0001
rather thandist
.You can replace the
if (i>10)...
with amin
statement, which will be branchless and so likely run faster.
So the code would look something like:
do q=1,pppp
do j=1,pppp
vect = xx(:,q)-xx(:,j)
within_tolerance(j, q) = dot_product(vect, vect) < 1.0e-8
enddo
enddo
do q=1,pppp
i=0
do j=1,pppp
if (within_tolerance(j, q)) then
i = min(i 1, 10)
caravec(q,i)=j
endif
enddo
enddo
You may also want to transpose caravec
, but this will depend on how it is used elsewhere.
An XY problem solution
If this is the bottleneck of your code, you might want to look into voxel-based methods for finding nearest-neighbours within sets of vectors. A quick google brings up e.g. this.
Voxel methods remove the need for the double loop over q
and j
, which can potentially let you do these kinds of comparisons much faster.
On the other hand, voxel methods are quite complicated, and I don't think they're appropriate for all circumstances (e.g. if you have very dense regions of points and very sparse regions of points then I think a voxel method would struggle).