Filtering vector onto another vector in Parallel-CodePudding

I'm trying to filter a vector into another vector in parallel. My current setup generates too much overhead so it's even slower than serial. Concretely:

#pragma omp parallel for collapse(2)
for(int i = 0 ; i < a.size() ; i  ){
   for(int j = 0 ; j < a.size() ; j  ){
      if(meetsConditions(a[i], a[j])){
         std::vector<int> tmp = {i, j};
         #pragma omp critical
         b.push_back(tmp);
      }
   }
}

I'm saving the indices as I would like to later run a separate serial function on each couple that meets the condition:

for(auto element : b){
   doSmth(a[element[0]], a[element[1]]);
}

I tried doing it with a new empty vector, resizing it to a.size()*a.size(), allocating the elements using a third index that's in an atomic clause when it gets incremented, but that caused a data race (unless I saw it wrongly). How could I go about tackling this problem? Could maybe using lists make it easier? Or maybe storing pointers to these elements directly would make it easier? I'm really new to C so I'm not quite sure how I could get that to work.

CodePudding user response：

Assuming a.size() is large enough, I would parallelize only the first loop (no collapse), use a local b_local to each thread, and at the end concatenate all the b_local to the shared b (I am not skilled in C , so I'm not sure how to append several elements at once). This should be more efficient, as the critical is now outside of the loops

#pragma omp parallel
{
    //declare b_local here, same type as b
    #pragma omp for
    for (int i = 0 ; i < a.size() ; i  ){
        for(int j = 0 ; j < a.size() ; j  ){
            if(meetsConditions(a[i], a[j])){
                std::vector<int> tmp = {i, j};
                b_local.push_back(tmp);
            }
        }
    }
    #pragma omp critical
    b.insert(b.end(),b_local); // not sure about this one
}

Actually I'm not sure it's worth parallelizing this, unless meetsConditions() has many computations

CodePudding user response：

I would expect this code to run slower, as you have a a critcal section that is going to force the b.push_back(tmp) to run in a single thread.

Perhaps look to remove the critcal section and instead write the results directly to an already appropriatly sized b, something like:

vector<std::vector<int>> b;
b.resize(a.size() * a.size());

#pragma omp parallel for collapse(2)
for (int i = 0; i < a.size(); i  ) {
    for (int j = 0; j < a.size(); j  ) {
        if (meetsConditions(a[i], a[j])) {
            std::vector<int> tmp = { i, j };
            b.at((i*a.size()) j) = tmp;
        }
    }
}