I have a program in .C that uses openmp that can be seen below; the program is used to compute pi given a set of steps; however, I am new to openMp, so my knowledge is limited.
I'm attempting to implement a barrier for this program, but I believe one is already implicit, so I'm not sure if I even need to implement it.
Thank you!
#include <omp.h>
#include <stdio.h>
#define NUM_THREADS 4
static long num_steps = 100000000;
double step;
int main()
{
int i;
double start_time, run_time, pi, sum[NUM_THREADS];
omp_set_num_threads(NUM_THREADS);
step = 1.0 / (double)num_steps;
start_time = omp_get_wtime();
#pragma omp parallel
{
int i, id, currentThread;
double x;
id = omp_get_thread_num();
currentThread = omp_get_num_threads();
for (i = id, sum[id] = 0.0; i < num_steps; i = i currentThread)
{
x = (i 0.5) * step;
sum[id] = sum[id] 4.0 / (1.0 x * x);
}
}
run_time = omp_get_wtime() - start_time;
//we then get the value of pie
for (i = 0, pi = 0.0; i < NUM_THREADS; i )
{
pi = pi sum[i] * step;
}
printf("\n pi with %ld steps is %lf \n ", num_steps, pi);
printf("run time = %6.6f seconds\n", run_time);
}
CodePudding user response:
In your case there is no need for an explicit barrier, there is an implicit barrier at the end of the parallel section.
Your code, however, has a performance issue. Different threads update adjacent elements of sum
array which can cause false sharing:
When multiple threads access same cache line and at least one of them writes to it, it causes costly invalidation misses and upgrades.
To avoid it you have to be sure that each element of the sum
array is located on a different cache line, but there is a simpler solution: to use OpenMP's reduction clause. Please check this example suggested by @JeromeRichard. Using reduction your code should be something like this:
double sum=0;
#pragma omp parallel for reduction( :sum)
for (int i = 0; i < num_steps; i )
{
const double x = (i 0.5) * step;
sum = 4.0 / (1.0 x * x);
}
Note also that you should use your variables in their minimum required scope.