If I have this code,
subroutine min_distance(r,n,k,centroid,distance,indices,distancereg)
integer, intent(out):: n,k
real,dimension(:,:),intent(in),allocatable::centroid
real,dimension(:,:),intent(in),allocatable::r
integer,dimension(:),intent(out),allocatable::indices,distancereg
real ::d_min
integer::y,i_min,j,i
integer,parameter :: data_dim=2
allocate (indices(n))
allocate (distancereg(k))
!cost=0.d0
do j=1,n
i_min = -1
d_min=1.d6
do i=1,k
distance=0.d0
distancereg(i)=0.d0
do y=1,data_dim
distance = distance abs(r(y,j)-centroid(y,i))
distancereg(i)=distancereg(i) abs(r(y,j)-centroid(y,i))
end do
if (distance<d_min) then
d_min=distance
i_min=i
end if
end do
if( i_min < 0 ) print*," found error by assigning k-index to particle ",j
indices(j)=i_min
end do
What I want to do is, when I calculate distance for each k, I want to paralelize it. ie. Assign each thread to do it. For example if k=3, then for k=1 the distance calculated by thread 1, and so on. I have tried with omp_nested, omp_ordered, but still showing some error. will appreciate if there is any advice / guidance .
Thanks
CodePudding user response:
If you want to parallelize a loop (or loop nest) you have to wonder first which iterations are independent. In your case, each outer j
iteration computes an i_min
value that is 1. initialized in each i
iteration, and 2. written into location (j)
. So each i_min
calculation is independent and you can make the j
loop parallel. (You also have d_min
but that is never used.)
If the j
loop is long enough that should be enough to get high performance. You might be tempted to look at the next loop over i
. It computes a separate distance
value for each iteration, so that's again parallel. Except that you update i_min,d_min
, so you need to declare that loop a reduction
.
However, the two loops are not "perfectly nested", so you can not spread the total i,j
iteration space over the threads.
TLDR: your outer j
loop can be parallelized.
CodePudding user response:
What simply about:
do j=1,n
distancereg(:)=0.d0
!$OMP PARALLEL DO PRIVATE(y)
do i=1,k
do y=1,data_dim
distancereg(i)=distancereg(i) abs(r(y,j)-centroid(y,i))
end do
end do
!$OMP PARALLEL END DO
indices(j)=minloc(distancereg,dim=1)
end do
Since you are storing the distances for each i
, the search for the minimum value can be postponed after the loop on i
Or parallelizing the outer loop (here you don't need to store the distances):
!$OMP PARALLEL DO PRIVATE(i,y,i_min,d_min,distance)
do j=1,n
i_min = -1
d_min=1.d6
do i=1,k
distance=0.d0
do y=1,data_dim
distance = distance abs(r(y,j)-centroid(y,i))
end do
if (distance<d_min) then
d_min=distance
i_min=i
end if
end do
if( i_min < 0 ) print*," found error by assigning k-index to particle ",j
indices(j)=i_min
end do
!$OMP END PARALLEL DO