Optimize estimating Pi function C-CodePudding

I've wrote a program, that approximates Pi using the Monte-Carlo method. It is working fine, but I wonder if I can make it work better and faster, because when inserting something like ~n = 100000000 and bigger from that point, it takes some time to do the calculations and print the result.

I've imagined how I could try to approximate it better by doing a median for n results, but considering how slow my algorithm is for big numbers, I decided not doing so.

Basically, the question is: How can I make this function work faster?

Here is the code that I've got so far:

double estimate_pi(double n)
{
    int i;
    double x,y, distance;
    srand( (unsigned)time(0));
    double num_point_circle = 0;
    double num_point_total = 0;
    double final;

    for (i=0; i<n; i  )
    {
        x = (double)rand()/RAND_MAX;
        y = (double)rand()/RAND_MAX;
        distance = sqrt(x*x   y*y);
        if (distance <= 1)
        {
            num_point_circle =1;
        }
        num_point_total =1;
    }
    final = ((4 * num_point_circle) / num_point_total);
    return final;
}

CodePudding user response：

An obvious (small) speedup is to get rid of the square root operation. sqrt(x*x y*y) is exactly smaller than 1, when x*x y*y is also smaller than 1. So you can just write:

double distance2 = x*x   y*y;
if (distance2 <= 1) {
    ...
}

That might gain something, as computing the square root is an expensive operation, and takes much more time (~4-20 times slower than one addition, depending on CPU, architecture, optimization level, ...).

You should also use integer values for variables where you count, like num_point_circle, n and num_point_total. Use int, long or long long for them.

The Monte Carlo algorithm is also embarrassingly parallel. So an idea to speed it up would be to run the algorithm with multiple threads, each one processes a fraction of n, and then at the end sum up the number of points that are inside the circle.

Additionally you could try to improve the speed with SIMD instructions.

And as last, there are much more efficient ways of computing Pi. With the Monte Carlo method you can do millions of iterations, and only receive an accuracy of a couple of digits. With better algorithms its possible to compute thousands, millions or billions of digits.

E.g. you could read up on the on the website https://www.i4cy.com/pi/

CodePudding user response：

First of all, you should modify the code to consider how much the estimate changes after a certain number of iterations and stop when it reached a satisfiying accuracy.

For your code as is, with a fixed number of iterations, there isnt much to optimize that the compiler cannot do better than you. One minor thing you can improve is to drop computing the square root in every iteration. You don't need it, because sqrt(x) <= 1 is true if x <= 1.

Very roughly speaking you can expect a good estimate from your Monte Carlo method once your x,y points are uniformly distributed in the quarter circle. Considering that, there is a much simpler way to get points uniformly distributed without randomness: Use a uniform grid and count how many points are inside the circle and how many are outside.

I would expect this to converge much better (when decreasing the grid spacing) than Monte Carlo. However, you can also try to use it to speed up your Monte Carlo code by starting from counts of num_point_circle and num_point_total obtained from points on a uniform grid and then incrementing the counters by continuing with randomly distributed points.

CodePudding user response：

As the error decreases as the inverse of the square root of the number of samples, to gain a single digit you need one hundred times more samples. No micro-optimization will help significantly. Just consider that this method is of no use to effectively compute π.

If you insist: https://en.wikipedia.org/wiki/Variance_reduction.

CodePudding user response：

when Monte Carlo is an estimate anyway and the final multiplication is a multiple of 2,4,8 and so on you can also do bit operations. Any if statement makes it slow, so trying to get rid of it. But when we increase 1 or nothing (0) anyway you can get rid of the statement and reduce it to simple math, which should be faster. When i is initialised before the loop, and we are counting up inside the loop, it can also be a while loop.

double estimate_alt_pi(double n){
    int i = 0;
    double x, y;
    srand( (unsigned)time(0));
    int num_point_circle = 0;
    double num_point_total = 0;
    while(i<n) {
        i  ;
        x = (double)rand()/RAND_MAX;
        y = (double)rand()/RAND_MAX;
        num_point_circle  = (sqrt(x*x   y*y) <= 1);
        num_point_total  ;
    }
    return ((num_point_circle << 2) / num_point_total);
    //return ((4 * num_point_circle) / num_point_total);
}

Benchmarked with Xcode on a mac, looks like.

#include <time.h>
extern uint64_t dispatch_benchmark(size_t count, void (^block)(void));

int main(int argc, const char * argv[]) {
    size_t count = 1000000;
    double input = 1222.52764523423;
    __block double resultB;
    __block double resultA;
    uint64_t t = dispatch_benchmark(count,^{
        resultA = estimate_pi(input);
    });
    uint64_t t2 = dispatch_benchmark(count,^{
        resultB = estimate_alt_pi(input);
    });
    fprintf(stderr,"estimate_alt_pi=%f time used=%llu\n",resultA, t);
    fprintf(stderr,"estimate_____pi=%f time used=%llu\n",resultB, t2);
    return 0;
}

~1,35 times faster, or takes ~73% of the original time
but significant less difference when the given number is lower.

And also the whole algorithm works properly only up to inputs with maximal 7 digits.