Everyone good time of day! Not so long ago, I was able to parallel the recursive algorithm for searching for possible options for combining some events. At the moment, the code is as follows:
//#include's
// function announcements
// declaring a global variable:
QVector<QVector<QVector<float>>> variant; (or "std::vector")
int main() {
// reads data from file
// data are converted and analyzed
// the variant variable containing the current best result is filled in (here - by pre-analysis)
#pragma omp parallel shared(variant)
#pragma omp master
// occurs call a recursive algorithm of search all variants:
PEREBOR(Tabl_1, a, i_a, ..., reс_depth);
return 0;
}
void PEREBOR(QVector<QVector<uint8_t>> Tabl_1, QVector<A_struct> a, uint8_t i_a, ..., uint8_t reс_depth)
{
// looking for the boundaries of the first cycle for some reasons
for (int i = quantity; i < another_quantity; i ) {
// the Tabl_1 is processed and modified to determine the number of steps in the subsequent for cycle
for (int k = 0; k < the_quantity_just_found; k ) {
if the recursion depth is not 1, we go down further: {
// add descent to the next recursion level to the call stack:
#pragma omp task
PEREBOR(Tabl_1_COPY, a, i_a, ..., reс_depth-1);
}
else (if we went down to the lowest level): {
if (condition fulfilled) // condition check - READ variant variable
variant = it_is_equal_to_that_,_to_that...;
else
continue;
}
}
}
}
At the moment, this thing really works well, and on six cores the CPU gives an increase of more than 5.7 from the single-core version. As you can see, with a sufficiently large number of threads, there may be a failure associated with the simultaneous reading/writing of the variant variable. I understand she needs to be protected. At the moment, I see an output only in the use of blocking functions, since the critical section is not suitable because if the variable variant is written in only one section of the code (at the lowest level of recursion), then the reading occurs in many places. Actually, here is the question - if I apply the constructions:
omp_lock_t lock;
int main() {
...
omp_init_lock(&lock);
#pragma omp parallel shared(variant, lock)
...
}
...
else (if we went down to the lowest level): {
if (condition fulfilled) { // condition check - READ variant variable
omp_set_lock(&lock);
variant = it_is_equal_to_that_,_to_that...;
omp_unset_lock(&lock);
}
else
continue;
...
will this lock protect the reading of the variable in all other places? Or will I need to manually check the lock status and pause the thread before reading elsewhere? I will be incredibly grateful to the distinguished community for help!
CodePudding user response:
In OpenMP specification (1.4.1 The structure of OpenMP memory model) you can read
The OpenMP API provides a relaxed-consistency, shared-memory model. All OpenMP threads have access to a place to store and to retrieve variables, called the memory. In addition, each thread is allowed to have its own temporary view of the memory. The temporary view of memory for each thread is not a required part of the OpenMP memory model, but can represent any kind of intervening structure, such as machine registers, cache, or other local storage, between the thread and the memory. The temporary view of memory allows the thread to cache variables and thereby to avoid going to memory for every reference to a variable.
This practically means that (as with any relaxed memory model), only at well-defined points, are threads guaranteed to have the same, consistent view on the value of shared variables. In between such points, the temporary view may be different across the threads.
In your code you handled the problem of simultaneous writing of the same variable, but there is no guarantee that an another thread reads the correct value of the variable without additional measures.
You have 3 options to do (Note that each of these solutions not only will handle simultaneous read/writes, but also provides a consistent view on the value of shared variables.):
- If your variable is scalar type, the best solution is to use atomic operations. This is the fastest option as atomic operations are typically supported by the hardware.
#pragma omp parallel
{
...
#pragma omp atomic read
tmp=variant;
....
#pragma omp atomic write
variant=new_value;
}
- Use critical construct. This solution can be used if your variable is a complex type (such as class) and its read/write cannot be performed atomically. Note that it is much less efficient (slower) than an atomic operation.
#pragma omp parallel
{
...
#pragma omp critical
tmp=variant;
....
#pragma omp critical
variant=new_value;
}
- Use locks for each read/write of your variable. Your code is OK for write, but have to use it for reads as well. It requires the most coding, but practically the result is the same as using the critical construct. Note that OpenMP implementations typically use locks to implement critical constructs.