Can we offload double pointer to GPU using OpenMP-CodePudding

I am new to OpenMP and I am trying to parallelize a simple code with a double loop like this one:

for (int i=0; i<n; i  ){
    for (int j=0; j<n; j  ){
        c[i][j] = a[i][j]   b[i][j];
    }
}

The data types of a,b and c are and must stay double**.

I tried to convert my code to this:

#pragma omp target teams distribute parallel for collapse(2)\
    map(to: a[0:n][0:n],b[0:n][0:n]) map(from: c[0:n][0:n])
for (int i=0; i<n; i  ){
    for (int j=0; j<n; j  ){
        c[i][j] = a[i][j]   b[i][j];
    }
}

But I get Aborted (core dumped), could somebody help me please?

CodePudding user response：

First of all, note that compilers like GCC show explicitly the problem by printing the message: array section is not contiguous in 'map' clause.

As said in the comments, you need to transfer every contiguous blocks on the target device manually since OpenMP only support contiguous arrays/structures.

The following code should work but it is very inefficient, so do not use it in an application unless the goal is to benchmark it:

for (int i=0; i<n; i  )
{
    double* la = a[i];
    double* lb = b[i];
    double* lc = c[i];

    #pragma omp target teams distribute parallel for \
            map(to: la[0:n], lb[0:n]) map(from: lc[0:n])
    for (int j=0; j<n; j  )
    {
        lc[j] = la[j]   lb[j];
    }
}

Note that is should create a kernel for each line which is awful but it would be still very inefficient without that anyway (see the above comments for more information). Note that OpenMP mappers may be used if the number of blocks is known at compile time (and is relatively small).