I am trying to parallelize a for loop in my code base that should be embarrassingly parallel. However, Openmp is not doing so and rather is executing everything in sequential order. program complied using g , std=c 11, I have executed a small program to ensure if openmp works or not, and it worked just fine.
The code block I am trying to parallelize is given below
void class_tmv::activate(class_tmv &result, const &a, const &b, const &c, const &d, const &e, f) const
{
result.clear();
#pragma omp parallel for
for (unsigned int i = 0; i < tms.size(); i)
{
class_TM tmTemp;
tms[i].activate(tmTemp, a, b, c, d, e, f);
result.tms[i] = tmTemp;
}
}
Class_tmv
has class variable tms
, that is essentially a vector of Class_TM
objects. Class_TM
has a method also named activate
that gets called above, it is defined as
inline void Class_TM::activate(Class_TM &result, const &a, const &b, const &c, const &d, const &e, f) const
{
result.clear();
Class_TM tmTemp;
if (condition_1)
{
this->S_T(tmTemp, a, b, c, d, e, f);
}
else if (condition_2)
{
this->T_T(tmTemp, a, b, c, d, e, f);
}
else
{
cout << "The activation fundtion can be parsed." << endl;
}
result = tmTemp;
}
S_T
and T_T
are other methods in class_TM
.
The issue I'm having is the overall execution of the system is completely sequential, and the loop I'm trying to parallelize isn't working.
Any suggestions on what may be going wrong are extremely helpful. Any other solutions not related to openmp are also welcomed.
(This is my first time working on parallel applications)
CodePudding user response:
Did you use the -fopenmp
flag when compiling your code?
CodePudding user response:
Your full code or more context on the command used to compile would be helpful, but here are some pointers in the meantime:
As other answer mentioned, you should compile with
-fopenmp
or similar (depending on compiler). However, if you mention that you did a test and verified OpenMP worked correctly, then it's likely you did include that option and also the headers on your .c files.I noticed that on the for you attempt to use the pragma you use
i
, so that means that you are skipping the 0th element oftms
andresult.tms
. I don't know if this is what you want, or you instead needi
.As it seems you are experiencing, if you compile with OpenMP #pragmas and the optimizer sees no way to parallelize your for, or it is not in the "Canonical Form", then it will not be optimized. The Canonical Form of a for loop, among other things, requires that the stopping criteria is fixed and does not change across iterations (in this case the
tms.size()
). If the call totms[i].activate()
modifiestms
's size, then that would not follow Canonical form and would not be optimized (or the subsequent calls toS_T
andT_T
) Check if this is your case on your context.