Home > Software engineering >  Using multiple threads to do matrix Multiplication in C
Using multiple threads to do matrix Multiplication in C

Time:09-02

So, I was trying to write a program to do matrix multiplication using multiple threads and then plot a graph between the time taken and the number of threads used. I used the following approach:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
pthread_mutex_t lock;

#define M 200
#define N 300
#define P 400
#define X 2 // Number of Threads
#define RED "\x1b[31m"
#define GREEN "\x1b[32m"

int A[M][N], B[N][P], C[M][P], D[M][P];

int row = 0;

void *matrixMulti(void *arg)
{
    pthread_mutex_lock(&lock);
    int i = row  ;

    for (int j = 0; j < P; j  )
    {
        C[i][j] = 0;
        for (int k = 0; k < N; k  )
        {
            C[i][j]  = A[i][k] * B[k][j];
        }
    }

    
    pthread_exit(NULL);
    pthread_mutex_unlock(&lock);
}

void matrixMultiplicationWithoutThreading();
void matrixMultiplicationWithThreading();
void verifyIfBothMatrixAreSame();
int main()
{
    int m, n, p;
    // A: m*n Matrix, B: n*p Matrix
    for (int i = 0; i < M; i  )
        for (int j = 0; j < N; j  )
            A[i][j] = rand() % 10;
    // scanf("%d", &A[i][j]);
    for (int i = 0; i < N; i  )
        for (int j = 0; j < P; j  )
            B[i][j] = rand() % 10;
    // scanf("%d", &B[i][j]);
    
    struct timeval start, end;
    gettimeofday(&start, NULL);
    matrixMultiplicationWithoutThreading();
    gettimeofday(&end, NULL);
    double time = (end.tv_sec - start.tv_sec) * 1e6;
    time = (time   end.tv_usec - start.tv_usec) * 1e-6;
    printf("The time taken by simple matrix calculation without threding is %0.6f\n", time);

    struct timeval start_th, end_th;
    gettimeofday(&start_th, NULL);
    matrixMultiplicationWithThreading();
    gettimeofday(&end_th, NULL);
    time = (end_th.tv_sec - start_th.tv_sec) * 1e6;
    time = (time   end_th.tv_usec - start_th.tv_usec) * 1e-6;
    printf("The time taken by using the Threading Method with %d threads is %0.6f\n", X, time);

    verifyIfBothMatrixAreSame();
}

void matrixMultiplicationWithThreading()
{
    pthread_t threads[X];
    for (int i = 0; i < X; i  )
    {
        threads[i] = (pthread_t)-1;
    }

    // Computation Started:
    for (int i = 0; i < M; i  )
    {

        // At any moment only X threads at max are working
        if (threads[i] == (pthread_t)-1)
            pthread_create(&threads[i % X], NULL, matrixMulti, NULL);
        else
        {
            pthread_join(threads[i % X], NULL);
            pthread_create(&threads[i % X], NULL, matrixMulti, NULL);
        }
    }
    for (int i = 0; i < X; i  )
        pthread_join(threads[i], NULL);
    // Computation Done:
}

void matrixMultiplicationWithoutThreading()
{
    // Computation Started:
    for (int i = 0; i < M; i  )
        for (int j = 0; j < P; j  )
        {
            D[i][j] = 0;
            for (int k = 0; k < N; k  )
                D[i][j]  = A[i][k] * B[k][j];
        }
    // Computation Done:
}
void verifyIfBothMatrixAreSame()
{
    for (int i = 0; i < M; i  )
        for (int j = 0; j < P; j  )
        {
            if (C[i][j] != D[i][j])
            {
                printf(RED "\nMatrix's are not equal something wrong with the computation\n");
                return;
            }
        }
    printf(GREEN "\nBoth Matrixes are equal thus verifying the computation\n");
}

Now, this code works sometimes, and sometimes it doesn't, like the result does not match the actual result. Similarly, this code gives a segmentation fault in one of the Linux virtual machines. Also, even when it works correctly, it doesn't give the asymptotically decreasing graph. Instead, the time is almost constant with arbitrary variations with the thread number.

Can someone help with this, like why this is happening? I found multiple solutions to this problem on the internet; some of them don't work (rarely but it happens), but I haven't seen my approach yet; it might be an issue I think. So, can anyone comment on using pthread_create(&threads[i % X], NULL, matrixMulti, NULL), like why this is not a good idea?

EDITED: I have tried taking the suggestion and optimising the code, I have not done the Matrix multiplication efficient method, as we were asked to do the O(n^3) method, but I have tried doing the threading correctly. Is this correct?

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <math.h>

#define M 2
#define N 2
#define P 2
#define X 40 // Number of Threads
#define RED "\x1b[31m"
#define GREEN "\x1b[32m"

int t = 0; // Computation done by the first usedXFullthreads
int usedXFull = 0;

int A[M][N], B[N][P], C[M][P], D[M][P];

int row = 0;

void *matrixMulti(void *arg)
{
    int* l = (int *)arg;
    int n = *l;
    int i = 0, j = 0, k = 0, comp = 0;
    if (n <= usedXFull)
    {
        i = n * t / (N * P);
        j = (n * t - N * P * i) / N;
        k = n * t - N * P * i - N * j;
        if (n == usedXFull)
            comp = M * N * P - usedXFull * t;
        else
            comp = t;
    }
    while (comp)
    {
        if (i == M)
            printf(RED "Some fault in the code\n\n");
        C[i][j]  = A[i][k] * B[k][j];
        comp--;
        k  ;
        if (k == N)
        {
            j  ;
            if (j == P)
            {
                i  ;
                j = 0;
            }
            k = 0;
        }
    }

    return NULL;
}

void matrixMultiplicationWithoutThreading();
void matrixMultiplicationWithThreading();
void verifyIfBothMatrixAreSame();
int main()
{
    int m, n, p;
    // A: m*n Matrix, B: n*p Matrix
    for (int i = 0; i < M; i  )
        for (int j = 0; j < N; j  )
            A[i][j] = rand() % 10;
    // scanf("%d", &A[i][j]);
    for (int i = 0; i < N; i  )
        for (int j = 0; j < P; j  )
            B[i][j] = rand() % 10;
    // scanf("%d", &B[i][j]);
    for (int i = 0; i < M; i  )
        for (int j = 0; j < P; j  )
            C[i][j] = 0;

    struct timeval start, end;
    gettimeofday(&start, NULL);
    matrixMultiplicationWithoutThreading();
    gettimeofday(&end, NULL);
    double time = (end.tv_sec - start.tv_sec) * 1e6;
    time = (time   end.tv_usec - start.tv_usec) * 1e-6;
    printf("The time taken by simple matrix calculation without threding is %0.6f\n", time);

    struct timeval start_th, end_th;
    gettimeofday(&start_th, NULL);
    matrixMultiplicationWithThreading();
    gettimeofday(&end_th, NULL);
    time = (end_th.tv_sec - start_th.tv_sec) * 1e6;
    time = (time   end_th.tv_usec - start_th.tv_usec) * 1e-6;
    printf("The time taken by using the Threading Method with %d threads is %0.6f\n", X, time);

    verifyIfBothMatrixAreSame();
}

void matrixMultiplicationWithThreading()
{
    int totalComp = M * N * P; // Total Computation
    if ((double)totalComp / (double)X - (totalComp / X) > 0.5)
        t = ceil((double)totalComp / (double)X);
    else
        t = floor((double)totalComp / (double)X);
    if (t == 0)
        t  ; // Takes care of the condition when if total number of comutation is less than the total threads, very unlikely
    usedXFull = totalComp / t;
    int computationByLastUsedThread = totalComp - t * usedXFull;
    int computationIndex[X];

    pthread_t threads[X];

    // Computation Started:
    for (int i = 0; i < X; i  )
    {
        if (i >= usedXFull)
            computationIndex[i] = -1;
        else
            computationIndex[i] = i;
        int rc = pthread_create(&threads[i], NULL, matrixMulti, (void *)&computationIndex[i]);
        if (rc)
        {
            printf(RED "ERROR; return code from pthread_create() is %d\n", rc);
            exit(-1);
        }
    }
    for (int i = 0; i < X; i  )
        pthread_join(threads[i], NULL);
    // Computation Done:
}

void matrixMultiplicationWithoutThreading()
{
    // Computation Started:
    for (int i = 0; i < M; i  )
        for (int j = 0; j < P; j  )
        {
            D[i][j] = 0;
            for (int k = 0; k < N; k  )
                D[i][j]  = A[i][k] * B[k][j];
        }
    // Computation Done:
}
void verifyIfBothMatrixAreSame()
{
    for (int i = 0; i < M; i  )
        for (int j = 0; j < P; j  )
        {
            if (C[i][j] != D[i][j])
            {
                printf(RED "\nMatrix's are not equal something wrong with the computation\n");
                return;
            }
        }
    printf(GREEN "\nBoth Matrixes are equal thus verifying the computation\n");
}

CodePudding user response:

There are many issues in the code. Here are some points:

  • lock is not initialized with pthread_mutex_init which is required (nor freed).
  • There is no need for locks in a matrix multiplication: work sharing should be preferred instead (especially since the current lock make your code run fully serially).
  • Using pthread_exit is generally rather a bad idea, at least it is here. Consider just returning NULL. Besides, returning something is mandatory in matrixMulti. Please enable compiler warnings so to detect such a thing.
  • There is an out of bound of threads[i] in the 0..M based loop.
  • There is no need to create M threads. You can create 2 threads and divide the work in 2 even parts along the M-based dimension. Creating M threads while allowing only 2 threads to run simultaneously just add more overhead for no reason (it takes time for thread to be created and scheduled by the OS).
  • It is generally better to dynamically allocate large arrays than using static global C arrays.
  • It is better to avoid global variables and use the arg parameter so to get thread-specific data.

To design a fast matrix multiplication, please consider reading this article. For example, the ijk loop nest is very inefficient and should really not be used for sake of performance (not efficient in cache). Besides, note you can use a BLAS library for that (they are highly optimized and easy to use) though I guess this is a homework. Additionally, note that you can use OpenMP instead of pthread so to make the code shorter and easy to read.

  • Related