I'm quite new to using OpenMP and to stackoverflow, so apologies if this is a stupid question!
I'm trying to set up a large 2d vector to test my CUDA program. The creation for these large vectors is done by looping through all the values of the given dimensions (stored in their own vectors) and creating a row in a new vector, covering all possible permutations. Obviously the time taken to do this increases exponentially as you increase the number of dimensions, so I am looking to parallelize it.
Originally I thought the problem may be an incompatibility between OpenMP and the Thrust library host_vectors, so I switched to using normal vectors and the problem is persisting. Here is the full function:
thrust::host_vector<thrust::host_vector<float>> parallel_create_search_grid(
thrust::host_vector<float> d1,
thrust::host_vector<float> d2,
thrust::host_vector<float> d3,
thrust::host_vector<float> d4) {
std::vector<std::vector<float>> final2;
#pragma omp parallel shared(d1, d2, d3, d4, final2)
{
int j, k, l;
std::vector<float> temp(4);
thrust::host_vector<float> h_temp;
#pragma omp for
for (int i = 0; i < d1.size(); i )
{
for (j = 0; j < d1.size(); j )
{
for (k = 0; k < d1.size(); k )
{
for (l = 0; l < d1.size(); l )
{
temp[0] = d1[i];
temp[1] = d2[j];
temp[2] = d3[k];
temp[3] = d4[l];
std::cout << i << "," << j << "," << k << "," << l << std::endl;
final2.push_back(temp);
}
}
}
}
}
return final2;
}
It doesn't break immediately, it prints out many iterations before an exception is thrown, giving me the following:
Exception thrown: read access violation. this->_Myproxy was 0xFFFFFFFFFFFFFFFF.
The source of the exception is the following function in xmemory, but what it means is beyond me:
_CONSTEXPR20_CONTAINER void _Container_base12::_Swap_proxy_and_iterators_unlocked(_Container_base12& _Right) noexcept {
_Container_proxy* _Temp = _Myproxy;
_Myproxy = _Right._Myproxy;
_Right._Myproxy = _Temp;
if (_Myproxy) {
_Myproxy->_Mycont = this;
}
if (_Right._Myproxy) {
_Right._Myproxy->_Mycont = &_Right;
}
}
Any help would be greatly appreciated. Thank you!
CodePudding user response:
Your problem is with the line `final2.push_back(temp)'. You have multiple threads pushing back into the same vector. That is not possible. Create the vector before the loop, and then write to explicit locations in it.
In general, try to avoid push_back
as much as possible because it has performance problems. If you know the size of a vector, create it with that size. Scientific applications hardly ever need the dynamicism of push_back
.