How can i avoid the "i" dependency in this loop? Fortran-CodePudding

I'm working with OpenMP in my code, but in order to do that I have to solve this dependency:

do q=1,pppp
    i=0
    
    DO j=1, pppp
        do c1=1,3
            vect(c1)=xx(q,c1)-xx(j,c1)
        end do
        dist=sqrt(vect(1)**2 vect(2)**2 vect(3)**2)
        if(dist<0.0001)then
            i=i 1
            if(i>10)i=10
            caravec(q,i)=j
        endif
    ENDDO
ENDDO

I'm trying to avoid the ordered clause, because it is expensive, but I don't know how to remove the dependency. How can I do that?
Thanks for all the help

CodePudding user response：

DISCLAIMER: I have never programmed with Fortan. So my code most-likely has bugs.

But using logic and assuming that what you have stated is correct i.e., that the only dependency is the 'i' variable. You can try the following:

do q=1,pppp        
    DO j=1, pppp
        do c1=1,3
            vect(c1)=xx(q,c1)-xx(j,c1)
        end do
        dist(q,j)=sqrt(vect(1)**2 vect(2)**2 vect(3)**2)
    ENDDO
ENDDO

So you parallelize the outer loop with OpenMP. This way you can run in parallel the most computational demanding part of the loop without the 'i' dependency. Notice that I am saving the result of the distances into a vector.

Then sequentially you can do the remaining loop computation as follows:

do q=1,pppp
    i=0    
    DO j=1, pppp
        if(dist(q,j) < 0.0001)then
            i=i 1
            if(i>10)i=10
            caravec(q,i)=j
        endif
    ENDDO
ENDDO

The downside of the current approach is that besides requiring refactoring of the code it also uses more memory than the original one. On the other hand it removes the need for synchronization among threads.

As @Ian Bush mentioned in the comments you can further improve your sequential code by

testing n distance squared rather than distance will avoid the call to sqrt

CodePudding user response：

There are a few Fortran things you can do to improve on @dreamcrash' excellent answer:

Fortran is column-major, so you want to iterate over left-most indices in inner loops, not right-most indices. So you should probably transpose xx, so it's a 3*pppp array indexed as xx(c1,q) rather than a pppp*3 array indexed as xx(q,c1).
You probably want to use whole-array operations rather than single-element operations, as these have a better chance of being vectorised.
You can store the result of dist<0.0001 rather than dist.
You can replace the if (i>10)... with a min statement, which will be branchless and so likely run faster.

So the code would look something like:

do q=1,pppp        
  do j=1,pppp
    vect = xx(:,q)-xx(:,j)
    within_tolerance(j, q) = dot_product(vect, vect) < 1.0e-8
  enddo
enddo

do q=1,pppp
  i=0    
  do j=1,pppp
    if (within_tolerance(j, q)) then
      i = min(i 1, 10)
      caravec(q,i)=j
    endif
  enddo
enddo

You may also want to transpose caravec, but this will depend on how it is used elsewhere.

An XY problem solution

If this is the bottleneck of your code, you might want to look into voxel-based methods for finding nearest-neighbours within sets of vectors. A quick google brings up e.g. this.

Voxel methods remove the need for the double loop over q and j, which can potentially let you do these kinds of comparisons much faster.

On the other hand, voxel methods are quite complicated, and I don't think they're appropriate for all circumstances (e.g. if you have very dense regions of points and very sparse regions of points then I think a voxel method would struggle).