MPI Matrix Multiplication - Task sharing when size(of procs)

I am trying to perform matrix-matrix multiplication in MPI using c .

I have coded for the cases where number_of_processes = number_of_rows_of_matrix_A (so that rows of matrix_A is sent across all processes and matrix_B is Broadcasted to all processes to perform subset calculation and they are sent back to root process for accumulation of all results into Matrix_C) and I have also coded for the case when number_of_processes > number_of_rows_of_Matrix_A

I have no idea how to approach for the case when number_of_processes < rows_of_matrix_A.

Lets say I have 4 processes and 8 * 8 matrix_A and matrix_B. I can easily allocate first 4 rows to respective ranks of processes, i.e 0,1,2,3. How should I allocate the remaining rows so that I wont mess up with synchronization of the results which I get from respective processes.

Side note of my implementation: I have used only MPI_Recv, MPI_Send for all the coding part which I have done.

Thanks in advance.

CodePudding user response：

Let N be the number of rows and P the number of processes, then process p starts at row floor( p*N/P ). Try it. This gives a beautifully even distritution.

CodePudding user response：

From getting suggestion from people here, I came to the below solution.

floor(N * (j 1)/P) - floor(N * j/P)

Where :
N : Number of rows in matrix
P : Total number of processes available
j : jth process. (i.e if P = 4, j = 0,1,2,3)