Home > Software engineering >  How to use the lock routines and the sleep function in order to get rid of the explicite barriers of
How to use the lock routines and the sleep function in order to get rid of the explicite barriers of

Time:10-05

I implemented the following test code:

program test
  use OMP_LIB
  implicit none
  integer::num_thread,nthreads
  integer::a=1
  integer(kind = OMP_lock_kind) :: lck !< a lock
  call omp_init_lock(lck)
  !$OMP PARALLEL SHARED(a,lck) PRIVATE(num_thread,nthreads)
  num_thread=OMP_GET_THREAD_NUM() !< le rang du thread 
  nthreads=OMP_GET_NUM_THREADS() !< le nombre de threads

  if (num_thread==0) then
     call omp_set_lock(lck)
     a=a 5
     a=a 7
     call omp_unset_lock(lck)
  end if
  
  !$OMP BARRIER 
  
  if (num_thread == 1) then
     a=a 1
  end if

  !$OMP BARRIER 
  if (num_thread == 2) then
     a=a 1
  end if

  !$OMP BARRIER 
  if (num_thread == 3) then
     a=a 1
  end if

  !$OMP END PARALLEL
  call omp_destroy_lock(lck)
  
  print*,'a is equal to: ',a
  
contains
  recursive subroutine system_sleep(wait)
    use,intrinsic :: iso_c_binding, only: c_int
    integer,intent(in) :: wait
    integer(kind=c_int):: waited
    interface
       function c_usleep(msecs) bind (C,name="usleep")
         import
         integer(c_int) :: c_usleep
         integer(c_int),intent(in),VALUE :: msecs
       end function c_usleep
    end interface
    if(wait.gt.0)then
       waited=c_usleep(int(wait,kind=c_int))
    endif
  end subroutine system_sleep

  recursive subroutine wait(full)
    logical,intent(in)::full
    do
       call system_sleep(1)
       if (full .eqv. .true.) EXIT

    end do
  end subroutine wait
end program test

As you can see, the threads are only updating the value of an integer a. I want to get rid of the first synchronization barrier and to replace it with a block of code. To do so, I thought about using the sleep function and the lock routines in order to avoid the concurrency problems. By executing this code, I get : a is equal to: 16.

The following code is the implementation without the first synchronization barrier:

program test
  use OMP_LIB
  implicit none
  integer::num_thread,nthreads
  integer::a=1
  integer(kind = OMP_lock_kind) :: lck !< a lock
  call omp_init_lock(lck)
  !$OMP PARALLEL SHARED(a,lck) PRIVATE(num_thread,nthreads)
  num_thread=OMP_GET_THREAD_NUM() !< le rang du thread 
  nthreads=OMP_GET_NUM_THREADS() !< le nombre de threads

  if (num_thread==0) then
     call omp_set_lock(lck)
     a=a 5
     a=a 7
     call omp_unset_lock(lck)
  end if

  if (num_thread .ne. 0) then  
     do
        call omp_set_lock(lck)
        if (a==13) then
           exit
        else
           call omp_unset_lock(lck)
           call system_sleep(1)
        end if            
     end do 
     call omp_unset_lock(lck)
  end if


  if (num_thread == 1) then
     a=a 1
  end if

  !$OMP BARRIER 
  if (num_thread == 2) then
     a=a 1
  end if

  !$OMP BARRIER 
  if (num_thread == 3) then
     a=a 1
  end if

  !$OMP END PARALLEL
  call omp_destroy_lock(lck)
  
  print*,'a is equal to: ',a
  
contains
  recursive subroutine system_sleep(wait)
    use,intrinsic :: iso_c_binding, only: c_int
    integer,intent(in) :: wait
    integer(kind=c_int):: waited
    interface
       function c_usleep(msecs) bind (C,name="usleep")
         import
         integer(c_int) :: c_usleep
         integer(c_int),intent(in),VALUE :: msecs
       end function c_usleep
    end interface
    if(wait.gt.0)then
       waited=c_usleep(int(wait,kind=c_int))
    endif
  end subroutine system_sleep

  recursive subroutine wait(full)
    logical,intent(in)::full
    do
       call system_sleep(1)
       if (full .eqv. .true.) EXIT

    end do
  end subroutine wait
end program test

When I run this code, I get a blinking cursor and no results displayed.

I don't understand why and how the threads approach this code. I would like to mention that the condition a==13 is due to the fact that thread number 0 (master) will add 12 to the initial value of a which is 1. We only leave the loop when the master thread finishes its calculation and sets a to the value 13.

I hope you can help me to make this code work.

CodePudding user response:

The problem is that the code

if (num_thread == 1) then
  a=a 1
end if

is not behind any kind of barrier, so it can happen on one thread while another thread is sleeping in the loop. This means that when the thread in the loop wakes up, a is larger than 13, and so the loop cannot be broken out of. This in turn means that the threads caught in the loop will never reach the !$OMP BARRIER, and so the program will hang forever.

This can be accounted for either by putting a barrier before the a=a 1 section, or by replacing the condition to exit the loop (if (a==13) then) with the more permissive if (a>=13) then.

You can identify these kinds of problems using a debugger, or by adding write statements throughout your code, e.g. as

if (num_thread==0) then
  call omp_set_lock(lck)
  a=a 5
  a=a 7
  write(*,*) 'a =12 done'
  call omp_unset_lock(lck)
end if

and

write(*,*) 'Thread ', num_thread, ' of ', nthreads, ' at barrier 1'
!$OMP BARRIER
  • Related