showing the variation in elapsed time with number of threads change in openMP code-CodePudding

I have implemented Sieve of Eratosthenes to find out the prime numbers using openMp method for various term values and thread.

Here is my code

// Type your code here, or load an example.
#include <stdio.h>
#include<math.h>
#include<string.h>
#include<stdlib.h>
#include<omp.h>
 

void SieveOfEratosthenes(int n)
{  
      
    bool prime[n   1];
    memset(prime, true, sizeof(prime));
    
    const int end_p=sqrt(n);
    #pragma omp parallel for 
    for (int p = 2; p <= end_p; p  )
    {
        bool prime_p;
        #pragma omp atomic read
        prime_p=prime[p];
        if (prime_p == true)
        {
            for (int i = p * p; i <= n; i  = p)
            {
                #pragma omp atomic write
                prime[i] = false;
            }
        }
    }
    
    for (int p = 2; p <= n; p  )
        if (prime[p])
            printf("%d  ", p);
    
}
int main(int argc, char* argv[]){
        int procs;
        int term =100;
        double start,finish;
        procs = omp_get_num_procs();
        omp_set_num_threads(procs);
        start = omp_get_wtime();
        SieveOfEratosthenes(term);
        finish = omp_get_wtime();
        printf("Elapsed time = %e seconds\n", finish - start);

        return 0;
}

I am going to show the output of elapsed time for various terms and threads

Here is the result:

1.term=100
a)Thread 1 , elapsed time= 3.879014e-04 seconds
b)Thread 2, elapsed time = 3.887471e-04 seconds
c)Thread 4, elapsed time = 3.742063e-04 seconds
d)Thread 8, elapsed time = 3.988100e-04 seconds
e)Thread 16, elapsed time = 5.262811e-04 seconds

2.term = 100000
a)Thread 1, elapsed time = 6.131708e-03
b)Thread 2, elapsed time = 4.231855e-03
c)Thread 4, elapsed time = 4.193152e-03
d)Thread 8, elapsed time = 6.109135e-03
e)Thread 16, elapsed time = 4.225969e-03

3.term = 100000000
a)Thread 1, elapsed time = 1.237387e 01
b)Thread 2, elapsed time = 1.184531e 01
c)Thread 4, elapsed time = 1.160130e 01
d)Thread 8, elapsed time = 1.128761e 01
e)Thread 16, elapsed time = 1.18116e 01seconds

Now I can see from the statistics when term is 100 elapsed time increased for thread 8,16(1-d),e))

When term = 100000, elapsed time is increased for thread 8(2 d)), then again decrease.

When term = 100000000, elapsed time is increased in for number of threads 16 than number of threads 8.

My confusion is when the task divided on number of threads elapsed time should decrease. I mean if the number of threads increased elapsed time decreased. However, I saw a variation in my result.

It would be great if someone help me to find out what I missed in my code.

Thank you.

CodePudding user response：

Use omp_set_num_threads(thread_count); in your main to set the number of threads.
Exclude the printout from speed measurement, as it has nothing to do with the calculation and it is the slowest part of your code.
Use dynamic schedule #pragma omp parallel for schedule(dynamic). The load balance is not optimal using the default schedule, because the bigger the p the lower the workload.
Your code has a race condition, which is undefined behavior in C language.

I saw that atomic operation did not have any effect on the code. I got same result with them without them.

Please understand that even if you happen to get correct results on a given compiler/architecture (e.g. x86/64), it does not necessarily mean that your code is correct. Your code should work properly (and as designed) using any compilers/computer architectures.

After these changes, I see speedup:

term=100000000;
Thread 1, Elapsed time = 6.903768e-01 seconds
Thread 8, Elapsed time = 2.813646e-01 seconds