The present question is embedded in a multithreaded setting where 'several' (e.g. 5) threads are working after having each having started listening with MPI_Irecv
using as source MPI_ANY_SOURCE
. Before exiting the function, each thread should check if a message was received or else cancel the request to free up the memory.
The assumption of the message only arriving to one of the N (e.g. 5) threads is here made, and the problem here referred is that which arises if in the time between (1) checking if a message has arrived and (2) canceling the request if the previous test returned false, indeed a message should arrive.
As a side note, using a single receiver that writes on an atomically-accessed queue should solve it. But it implies major code refactoring, and maybe a performance decrease.
The question is if the MPI standard provides an answer to this problem and what is it, or else if the following (pseudo) code is indeed sufficient protection.
The proposed solution seems suspicious as logs (see below) only show the combination "irecv not capturing messages failure to cancel the related request". It seems to be no memory build up tho.
in main.cpp
//...
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
if (provided < MPI_THREAD_MULTIPLE) {
error_report("[error] The MPI did not provide the requested threading behaviour.");
}
//...
On the relevant function.
// Start recieving
MPI_Irecv(&buffer, 1, MPI_DOUBLE,
MPI_ANY_SOURCE,
VERTEXVAL_REQUEST_FLAG,
MPI_COMM_WORLD,
&R);
// some work goes on here ...
// Before exiting, we check if a message arrived.
int flag1=-437, flag2=-437; // any initialization
MPI_Status status1, status2;
status2.MPI_ERROR = -999; // again, any initialization
status1.MPI_ERROR = -999;
MPI_Test(&R, &flag1, &status1);
if (flag1 != 1){
MPI_Cancel(&R);
MPI_Test_cancelled(&status2, &flag2);
}
if ((flag1 == 1) || ((flag1!=1) && (flag2!=1))) {
if (flag1 == 1) {
build_answer(answer, REF, buffer, status1.MPI_SOURCE, MYPROC);
printf("A request failed to be cancelled, we are assuming we recieved it! we computed val = %f, recieved buffer = %f ; flags12 = %d %d ; source = %d ; tag = %d; error = %d\n",
answer, buffer, flag1, flag2, status1.MPI_SOURCE, status1.MPI_TAG, status1.MPI_ERROR);
std::cout << std::flush;
MPI_Ssend(&answer, 1, MPI_DOUBLE, status1.MPI_SOURCE, (int) buffer, MPI_COMM_WORLD);
printf("Completed!\n");
std::cout << std::flush;
} else {
printf("A request failed to be cancelled: will ignore it. Recieved buffer = %f ; flags12 = %d %d ; source = %d ; tag = %d ; status error = %d\n",
buffer, flag1, flag2, status2.MPI_SOURCE, status2.MPI_TAG, status2.MPI_ERROR);
std::cout << std::flush;
}
}
This 'protection' appears to solve the 1 in 1000 deadlocks that used to arise in the program as the previous version just assumed that failure to cancel meant that the message had arrived. In particular, log entries show the following values printed through printf
.
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22020 ; source = 2 ; tag = 0 ; status error = -183549351 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = -0.000000 ; flags12 = 0 21998 ; source = 2 ; tag = 0 ; status error = -1563532711 A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22033 ; source = 2 ; tag = 0 ; status error = -691551655 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 8.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 8.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = -0.000000 ; flags12 = 0 21998 ; source = 2 ; tag = 0 ; status error = -1563532711 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22033 ; source = 2 ; tag = 0 ; status error = -691551655
CodePudding user response:
Check into MPI_Mprobe
and MPI_Mrecv
which are precisely for your multi-threaded scenario. It should not be necessary to cancel receives. For details, see https://www.slideshare.net/jsquyres/mpimprobe-is-good-for-you