I am trying to learn parallelization with openmp.
This code performs two-dimensional Monte Carlo integration.
Program execution(with omp) time is about 25 seconds for N = 900 million
#include <cmath>
#include <cstdint>
#include <iostream>
#include <omp.h>
#include <random>
double func(double x, double y) {
return sqrt(x y);
}
using std::cout;
using std::endl;
const double a = 0.0, b = 1.0, c = 3.0, d = 5.0;
const uint64_t N = 900000000;
int main() {
double sum = 0.0;
std::random_device rd;
#pragma omp parallel
{
uint32_t seed;
#pragma omp critical
{
seed = rd();
}
std::mt19937 gen(seed);
std::uniform_real_distribution<double> dist(0.0, 1.0);
#pragma omp for reduction( : sum)
for (int64_t i = 0; i < N; i ) {
double x = a (b - a) * dist(gen);
double y = c (d - c) * dist(gen);
sum = func(x, y);
// sum - local for each thread
// initialized to 0 in each thread
// and will be added to the global sum when exiting the loop
}
}
// The result is calculated after the parallel section
double ans = sum * (b - a) * (d - c)/ N;
cout << ans<<endl;
return 0;
}
How can I convert this to thread-safe code (professor said that it is not thread-safe)?
CodePudding user response:
As far as I can see, your professor is mistaken. The code is thread-safe. The reduction clause is written correctly.
The only issue I have is with the way you initialize the random number generators.
- It is inefficient calling into
random_device
once per thread - It opens up the chance of a birthday problem. There is a (very minor) chance that two threads start with the same random state because they receive the same random value.
The more efficient way to create multiple random sequences is to increment the seed by 1 per thread. A good random number generator will convert this to vastly different random values. See for example here: https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/
Another minor issue is that you can move the reduction to the end of the parallel section. Then you can declare the for-loop as nowait to avoid one implicit openmp barrier at the end of the loop. This optimization is possible because you do not need the final sum inside the parallel section. However, this change is only an optimization. It does not change the correctness of your code.
The code would look something like this:
#include <cmath>
// using std::sqrt
#include <random>
// using std::random_device, std::mt19937, std::uniform_real_distribution
#include <iostream>
// using std::cout
#include <omp.h>
// using omp_get_thread_num
int main() {
const double a = 0.0, b = 1.0, c = 3.0, d = 5.0;
const int64_t N = 900000000;
double sum = 0.0;
std::random_device rd;
// initial seed
const std::random_device::result_type seed = rd();
# pragma omp parallel reduction( : sum)
{
std::uniform_real_distribution<double> dist(0.0, 1.0);
/*
* We add the thread number to the seed. This is better than using multiple
* random numbers because it avoids the birthday problem. See for example
* https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/
*/
std::mt19937 gen(seed static_cast<unsigned>(omp_get_thread_num()));
# pragma omp for nowait
for(int64_t i = 0; i < N; i) {
const double x = a (b - a) * dist(gen);
const double y = c (d - c) * dist(gen);
sum = std::sqrt(x y);
}
}
const double ans = sum * (b - a) * (d - c)/ N;
std::cout << ans << '\n';
return 0;
}
Also note how you can indent the pragmas to improve readability. Just keep the # at the start of the line.