I doing simulations of N
particles that interact between each other in a given range. To avoid a N^2
computation over the particle, I sort (spatially) the particles in an array in which I stored the index of one particle and then each particles points to another particle which is in the same box. I have already written a sequential code in C and I am trying to implement a OpenMP version to increase the number of particles. To define particles and the array I use two classes
class Boxes
{
int m_NX;
int m_NY;
int m_Nboxes;
Eigen::ArrayXi m_boxes;
...
};
class Particles
{
int m_nbParticles;
Eigen::ArrayXd m_positions;
Eigen::ArrayXi m_nextParticles;
...
};
Then to sort the particles I doing this
void updateBoxes(Boxes &p_boxes, Particles &p_particles)
{
...
for (int i = 0; i < p_particles.m_nbParticles; i )
{
indexX = p_particles.position(i).x() / dX;
indexY = p_particles.position(i).y() / dX;
indexBox = indexX NXboxes*indexY;
p_particles.m_nextParticles[i] = p_boxes.m_boxes[indexBox];
p_boxes.m_boxes[indexBox] = i;
}
}
I try to parallelize this part by adding pragma omp atomic
but I get an error at the compile
#pragma omp parallel for
for (int i = 0; i < p_particles.size(); i )
{
indexX = p_particles.position(i).x() / dX;
indexY = p_particles.position(i).y() / dX;
indexBox = indexX NXboxes*indexY;
#pragma omp atomic
p_particles.m_nextParticles[i] = p_boxes.m_boxes[indexBox];
#pragma omp atomic
p_boxes.m_boxes[indexBox] = i;
}
But it doesn work and I get an error at compile time.
error: the statement for 'atomic' must be an expression statement of form ' x;', '--x;', 'x ;', 'x--;', 'x binop= expr;', 'x = x binop expr' or 'x = expr binop x', where x is an l-value expression with scalar type
I already parallelized the other part of the code and even if this part is around 8% of the total time for a single thread code, it becomes more and more important when I increase the number of thread. I am relatively new with OpenMP and I am stuck on this. What is the best way to parallelize this part of the code?
CodePudding user response:
Your line
p_boxes.m_boxes[indexBox] = i;
can have multiple writes into the same location. Often in this sort of conversions from serial to parallel there will be a reduction happening (say, a sum) and you need to worry about how to parallelize that.
In your serial code however, only the last one of these is preserved. If you don't care about preserving precisely the serially last one, and any saved i
value is good, then you indeed declare this statement atomic and you'll be fine.
CodePudding user response:
To avoid the problem mentioned in the comments and the previous answer I try
omp_lock_t lock[size];
for (int i=0; i<size; i )
omp_init_lock(&(lock[i]));
#pragma omp parallel for
for (int i = 0; i <size; i )
{
int indexX, indexY, indexBox;
indexX = p_particles.position(i).x() / dX;
indexY = p_particles.position(i).y() / dX;
indexBox = indexX NXboxes*indexY;
omp_set_lock(&(lock[i]));
p_particles.m_nextParticles[i] = p_boxes.m_boxes[indexBox];
p_boxes.m_boxes[indexBox] = i;
omp_unset_lock(&(lock[i]));
}
for (int i=0; i<size; i )
omp_destroy_lock(&(lock[i]));
But the routine becomes incredibly slow (x30). Did I do something wrong?