I'm trying to filter a vector into another vector in parallel. My current setup generates too much overhead so it's even slower than serial. Concretely:
#pragma omp parallel for collapse(2)
for(int i = 0 ; i < a.size() ; i ){
for(int j = 0 ; j < a.size() ; j ){
if(meetsConditions(a[i], a[j])){
std::vector<int> tmp = {i, j};
#pragma omp critical
b.push_back(tmp);
}
}
}
I'm saving the indices as I would like to later run a separate serial function on each couple that meets the condition:
for(auto element : b){
doSmth(a[element[0]], a[element[1]]);
}
I tried doing it with a new empty vector, resizing it to a.size()*a.size(), allocating the elements using a third index that's in an atomic clause when it gets incremented, but that caused a data race (unless I saw it wrongly). How could I go about tackling this problem? Could maybe using lists make it easier? Or maybe storing pointers to these elements directly would make it easier? I'm really new to C so I'm not quite sure how I could get that to work.
CodePudding user response:
Assuming a.size()
is large enough, I would parallelize only the first loop (no collapse), use a local b_local to each thread, and at the end concatenate all the b_local to the shared b (I am not skilled in C , so I'm not sure how to append several elements at once). This should be more efficient, as the critical
is now outside of the loops
#pragma omp parallel
{
//declare b_local here, same type as b
#pragma omp for
for (int i = 0 ; i < a.size() ; i ){
for(int j = 0 ; j < a.size() ; j ){
if(meetsConditions(a[i], a[j])){
std::vector<int> tmp = {i, j};
b_local.push_back(tmp);
}
}
}
#pragma omp critical
b.insert(b.end(),b_local); // not sure about this one
}
Actually I'm not sure it's worth parallelizing this, unless meetsConditions() has many computations
CodePudding user response:
I would expect this code to run slower, as you have a a critcal section that is going to force the b.push_back(tmp) to run in a single thread.
Perhaps look to remove the critcal section and instead write the results directly to an already appropriatly sized b, something like:
vector<std::vector<int>> b;
b.resize(a.size() * a.size());
#pragma omp parallel for collapse(2)
for (int i = 0; i < a.size(); i ) {
for (int j = 0; j < a.size(); j ) {
if (meetsConditions(a[i], a[j])) {
std::vector<int> tmp = { i, j };
b.at((i*a.size()) j) = tmp;
}
}
}