Here's my code, which allows different threads to compute conv2d and merge the results back to the result matrix.
#pragma omp parallel private(tid)
float *gptr;
gptr = malloc(M * M * sizeof(float) / NUMTHREADS);
tid = omp_get_thread_num();
#pragma omp for
for (int i = 0; i < M; i )
{
for (int j = 0; j < M; j )
{
float tmp = 0.;
for (int k = 0; k < GW; k )
{
int ii = i k - W2;
for (int l = 0; l < GW; l )
{
int jj = j l - W2;
if (ii >= 0 && ii < M && jj >= 0 && jj < M)
{
tmp = float_m[k * M l] * GK[ii * GW jj];
}
}
}
*(gptr (i - tid * M / NUMTHREADS) * M j) = tmp;
}
}
But the declaration pragma omp parallel private(tid)
didn't work properly.
It gives the error message for float declaration next line:
\omp.c: In function 'main':.\omp.c:86:5: error: expected expression before 'float'
float *gptr;
^~~~~
Where did this go wrong and how to fix it?
CodePudding user response:
Your parallel region is longer than a single line, so you have to use curly braces:
#pragma omp parallel private(tid)
{
//your code
}
UPDATE - a more precise answer with references:
From OpenMP specification the syntax of the parallel construct is as follows:
#pragma omp parallel [clause[ [,] clause] ... ] new-line
structured-block
The structured block is:
an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct.
The definition of compound statement:
A compound statement, or block, is a brace-enclosed sequence of statements and declarations.
In your code
#pragma omp parallel private(tid)
float *gptr;
float *gptr;
is not an executable/compound statement/OpenMP construct, therefore you get an error message. You have to create a compound statement by putting your code between {
and }
.
CodePudding user response:
Your immediate problem is that you need curly braces around the material of the parallel
region. Then, considering putting a collapse(2)
on the i,j
loops. But are you sure that allocating gptr
in the parallel region is what you want? It means that each thread creates its own copy, and which stays local to the parallel region. You probably want to allocate outside and pass the pointer in as shared.