I'm attempting to parallelise some old fixed format Fortran code with OpenMP. I can't figure out how to fully parallelise the following structure of nested loops, with one outer loop and 2 sequential inner loops:
do y = 1,ny
do x = 1,nx
calculation 1
enddo
intermediate calculation (calculation 1)
do x = 1,nx
calculation 2 (intermediate calculation)
enddo
enddo
calculation 2
is such that it cannot be included in the first inner loop, and must be included in the outer loop as opposed to in a separate outer loop. This is due to a dependency on an intermediate calculation, itself dependent on all values of calculation 1
I am compiling with gfortran, and setting the environment variable OMP_NUM_THREADS=4
. The following code illustrates the 3 approaches I've tested:
c Test of using OpenMP to parallelise sequential inner loops and an
c outer loop
c Use export OMP_NUM_THREADS=4
program parallelTest
c declarations
implicit none
integer nx, ny
parameter (nx = 2, ny = 5)
integer omp_get_thread_num, i, j, A(nx,ny), B(nx,ny), C(nx,ny),
> D(nx,ny), E(nx,ny), F(nx,ny)
c initialisation
A = 7
B = 7
C = 7
D = 7
E = 7
F = 7
c attempt 1: just executes first loop, 2nd loop is ignored
c$omp parallel do shared(A,B) private(i,j) schedule(static) collapse(2)
do j = 1,ny
do i = 1,nx
A(i,j) = omp_get_thread_num()
enddo
do i = 1,nx
B(i,j) = omp_get_thread_num()
enddo
enddo
c$omp end parallel do
c attempt 2: only parallelises outer loop
c$omp parallel do shared(C,D) private(i,j) schedule(static)
do j = 1,ny
do i = 1,nx
C(i,j) = omp_get_thread_num()
enddo
do i = 1,nx
D(i,j) = omp_get_thread_num()
enddo
enddo
c$omp end parallel do
c attempt 3: only parallelises inner loops
c$omp parallel shared(E,F) private(i,j)
do j = 1,ny
c$omp do schedule(static)
do i = 1,nx
E(i,j) = omp_get_thread_num()
enddo
c$omp end do
c$omp do schedule(static)
do i = 1,nx
F(i,j) = omp_get_thread_num()
enddo
c$omp end do
enddo
c$omp end parallel
c print output to terminal
do i = 1,nx
print *, 'A(', i, ',:) = ', A(i,:)
enddo
print *
do i = 1,nx
print *, 'B(', i, ',:) = ', B(i,:)
enddo
print *
print *
do i = 1,nx
print *, 'C(', i, ',:) = ', C(i,:)
enddo
print *
do i = 1,nx
print *, 'D(', i, ',:) = ', D(i,:)
enddo
print *
print *
do i = 1,nx
print *, 'E(', i, ',:) = ', E(i,:)
enddo
print *
do i = 1,nx
print *, 'F(', i, ',:) = ', F(i,:)
enddo
end
This gives the following output:
A( 1 ,:) = 0 0 1 2 3
A( 2 ,:) = 0 1 1 2 3
B( 1 ,:) = 7 7 7 7 7
B( 2 ,:) = 7 7 7 7 7
C( 1 ,:) = 0 0 1 2 3
C( 2 ,:) = 0 0 1 2 3
D( 1 ,:) = 0 0 1 2 3
D( 2 ,:) = 0 0 1 2 3
E( 1 ,:) = 0 0 0 0 0
E( 2 ,:) = 1 1 1 1 1
F( 1 ,:) = 0 0 0 0 0
F( 2 ,:) = 1 1 1 1 1
In approach 1, the outer loop and first inner loop are parallelised as I expect, but the 2nd loop doesn't execute (presumably because the do
doesn't immediately follow a c$omp do
?). In approach 2, only the outer loop is parallelised. In approach 3, only the inner loops are parallelised.
My question is: how do I get matrix B
to be the same as matrix A
? This seems like it should be a straightforward task; I'm assuming there's an OpenMP clause or structure I'm not utilising, but would much appreciate some pointing in the right direction of what to search for (new to OpenMP!).
CodePudding user response:
If the information flow is as you have described it, I would re-write your loops as
do y = 1,ny
do x = 1,nx
calculation 1
enddo
enddo
do y = 1,ny
intermediate calculation (calculation 1)
enddo
do y = 1,ny
do x = 1,nx
calculation 2 (intermediate calculation)
enddo
enddo
And then parallelise each y
loop separately.