I have been experimenting with std::async
. I wrote a (terribly inefficient) function that sums up all prime numbers up to a given limit.
Without std::async
the function always gives the expected outcome as judged by a ctest
unit test. Using std::async
but no std::lock_guard
/ mutex
, there is (as expected) a big deviation between the expected value and the calculated one. Using std::lock_guard
/ mutex
, the outcome is still irreproducible and gives the correct outcome in around 50% of the tests, in the other 50% the value is off by 1-2 (e.g. 9151 instead of 9152).
I am debating if there is a problem because another bottle neck is introduced by my isPrime(int)
function. Alternatively, I am thinking that the program (sometimes) terminates prematurely before the last thread has done its work.
In either case, I don't understand why the std::lock_guard
does not seem to protect the variable count.
// primeAsync.cc
#include <future>
#include <vector>
#include "primeAsync.h"
#include "prime.h"
// mutex for thread safety
static std::mutex s_PrimeMutex;
// handle to store futures
static std::vector<std::future<void>> s_Futures;
static void countPrimesHelper(int* count, const long long number);
// passes ctest every other time, count is off by 1-2 (e.g. 9151 instead of 9152)
int countPrimesAsync(const long long limit) {
int count = 0;
for(long i = 2; i < limit; i) {
s_Futures.push_back(std::async(std::launch::async, countPrimesHelper, &count, i));
}
return count;
}
// helper function for countPrimesAsync, std::async
static void countPrimesHelper(int* count, const long long number) {
if(isPrime(number)) {
const std::lock_guard<std::mutex> lock(s_PrimeMutex);
(*count);
}
}
// prime.cc
#include <iostream>
#include <vector>
// always passes ctest
bool isPrime(const long long n) {
int mod;
for(long long m = n - 1; m > 1; --m) {
mod = n % m;
if(mod == 0) {
return false;
}
}
return true;
}
// always passes ctest
int countPrimes(const long long limit) {
int count = 0;
for(long i = 2; i < limit; i) {
if(isPrime(i)) {
count ;
}
}
return count;
}
CodePudding user response:
Another thing people haven't mentioned, is that atomic variables allow you to have a value that is accessed and written to by multiple threads without race conditions, and will do so much faster than a lock guard.
For example:
#include <atomic>
std::atomic<int> count;
static void countPrimesHelper(std::atomic<int>* count, const long long number) {
if(isPrime(number)) {
(*count) = 1;
}
}
This is race-condition free, even with 1 million threads accessing count. It works by doing the increment as one operation that can't be accessed in an incomplete state. It also prevents your processor, if you have a recent one, from doing this in parallel. (Using a low-level lock instruction that lasts for one operation)
Here is more information on the atomic header: https://en.cppreference.com/w/cpp/atomic
CodePudding user response:
I changed the function to wait for all futures to finish before count is returned:
// now seems to pass ctest every time
int countPrimesAsync(const long long limit) {
int count = 0;
for(long i = 2; i < limit; i) {
s_Futures.push_back(std::async(std::launch::async, countPrimesHelper, &count, i));
}
for(long i = 0; i < s_Futures.size(); i) {
s_Futures[i].wait();
}
return count;
}
That solutions passes the test!
Alternatively, I found the following solution to check the status of a future:
Get the status of a std::future
Not sure which one is preferred (if any).
Thanks already though!
In line with François Andrieux, this solution also works and passes the ctest
, does not require a static variable, and I think is more elegant than a second for loop:
// now seems to pass ctest every time
int countPrimesAsync(const long long limit) {
// handle to store futures
std::vector<std::future<void>> s_Futures;
int count = 0;
for(long i = 2; i < limit; i) {
s_Futures.push_back(std::async(std::launch::async, countPrimesHelper, &count, i));
}
s_Futures.clear();
return count;
}