Home > Enterprise >  Open MP in do nested loop
Open MP in do nested loop

Time:11-16

If I have this code,

subroutine min_distance(r,n,k,centroid,distance,indices,distancereg)
        integer, intent(out):: n,k
        real,dimension(:,:),intent(in),allocatable::centroid
        real,dimension(:,:),intent(in),allocatable::r
        integer,dimension(:),intent(out),allocatable::indices,distancereg
        real ::d_min
        integer::y,i_min,j,i
        integer,parameter :: data_dim=2
        allocate (indices(n))
        allocate (distancereg(k))
        !cost=0.d0
        
        do j=1,n

           i_min = -1
           d_min=1.d6
         
           do i=1,k
              distance=0.d0
              distancereg(i)=0.d0
              do y=1,data_dim
                 distance = distance abs(r(y,j)-centroid(y,i))
                 distancereg(i)=distancereg(i) abs(r(y,j)-centroid(y,i))
              end do
          
              if (distance<d_min) then
                 d_min=distance
                 i_min=i
              end if
           end do
           if( i_min < 0 ) print*," found error by assigning k-index to particle ",j
           indices(j)=i_min
        end do
        

What I want to do is, when I calculate distance for each k, I want to paralelize it. ie. Assign each thread to do it. For example if k=3, then for k=1 the distance calculated by thread 1, and so on. I have tried with omp_nested, omp_ordered, but still showing some error. will appreciate if there is any advice / guidance .

Thanks

CodePudding user response:

If you want to parallelize a loop (or loop nest) you have to wonder first which iterations are independent. In your case, each outer j iteration computes an i_min value that is 1. initialized in each i iteration, and 2. written into location (j). So each i_min calculation is independent and you can make the j loop parallel. (You also have d_min but that is never used.)

If the j loop is long enough that should be enough to get high performance. You might be tempted to look at the next loop over i. It computes a separate distance value for each iteration, so that's again parallel. Except that you update i_min,d_min, so you need to declare that loop a reduction.

However, the two loops are not "perfectly nested", so you can not spread the total i,j iteration space over the threads.

TLDR: your outer j loop can be parallelized.

CodePudding user response:

What simply about:

        do j=1,n
           distancereg(:)=0.d0
           !$OMP PARALLEL DO PRIVATE(y)
           do i=1,k
              do y=1,data_dim
                 distancereg(i)=distancereg(i) abs(r(y,j)-centroid(y,i))
              end do
           end do
           !$OMP PARALLEL END DO
           indices(j)=minloc(distancereg,dim=1)
        end do

Since you are storing the distances for each i, the search for the minimum value can be postponed after the loop on i

Or parallelizing the outer loop (here you don't need to store the distances):

        !$OMP PARALLEL DO PRIVATE(i,y,i_min,d_min,distance)
        do j=1,n
           i_min = -1
           d_min=1.d6         
           do i=1,k
              distance=0.d0
              do y=1,data_dim
                 distance = distance abs(r(y,j)-centroid(y,i))
              end do
              if (distance<d_min) then
                 d_min=distance
                 i_min=i
              end if
           end do
           if( i_min < 0 ) print*," found error by assigning k-index to particle ",j
           indices(j)=i_min
        end do
        !$OMP END PARALLEL DO
  • Related