Home > OS >  How to use MPI_Allgatherv on an std::vector of std::vector's?
How to use MPI_Allgatherv on an std::vector of std::vector's?

Time:11-09

I have a situation where every MPI process has a nested std::vector that needs to be shared among all other processes with MPI_Allgatherv. Here is a minimal example, where I define a 3x2 nested vector with simple entries (in this case the mpi rank).

#include <mpi.h>
#include <iostream>
#include <vector>

int main ()
{
  MPI_Init(NULL, NULL);
  int mpi_processes, mpi_rank;
  MPI_Comm_size(MPI_COMM_WORLD, &mpi_processes);
  MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);

  std::vector<std::vector<int>> local_data(3, std::vector<int>(2, mpi_rank));

I would like to combine the local_data vectors to one global vector {{0,0},{0,0},{0,0},{1,1},{1,1},{1,1},...,{mpi_processes,mpi_processes},{mpi_processes,mpi_processes},{mpi_processes,mpi_processes}}. As I understand MPI_Allgatherv should be perfect for this. (In this example all vectors have the same length, but I want to be ready for varying length, hence the Allgatherv and not Allgather.)

Therefore, I continue to define a vector for the global data and the necessary input for Allgather, i.e. the receiving count and the displacements.

  std::vector<std::vector<int>> global_data(mpi_processes*3, std::vector<int>(2));

  std::vector<int> recvcounts(mpi_processes, 3);
  std::vector<int> displs(mpi_processes,0);
  for(int i = 1; i < mpi_processes; i  )
    displs[i] = displs[i-1]  recvcounts[i-1];

Finally, my idea was to define a MPI data type to account for the fact that I have a vector of 2-dim vectors.

  MPI_Datatype mpi_datapoint;
  MPI_Type_contiguous(2, MPI_INT, &mpi_datapoint);
  MPI_Type_commit(&mpi_datapoint);

  MPI_Allgatherv(local_data.data(), 3, mpi_datapoint, global_data.data(), recvcounts.data(), displs.data(), mpi_datapoint, MPI_COMM_WORLD);

  MPI_Finalize();
  return 0;
}

This code compiles fine, but when I run it, I get a wall of unhelpful error messages starting with

*** Process received signal ***
Signal: Abort trap: 6 (6)
Signal code:  (0)
[ 0] 0   libsystem_platform.dylib            0x00007ff80375cc1d _sigtramp   29
 *** Process received signal ***
 Signal: Abort trap: 6 (6)
 Signal code:  (0)

Do you know what is wrong here? And do you know if and how I can use MPI_Allgatherv correctly on a vector of vectors?

Thanks in advance for any help on this!

CodePudding user response:

Do you really need all the data gathered together? If so, a vector<vector<>> is usually not a good idea. If you really need that data structure, wrap it in a class where you store a single vector, with a 2D indexing function. Or you can use the C 23 md_span.

If you absolutely absolute want that structure, you can use a type_hindexed and use the absolute addresses of the .data() parts of the vectors. But I doubt that that's the right solution.

  • Related