Fastest way to get square root in float value-CodePudding

I am trying to find a fastest way to make square root of any float number in C . I am using this type of function in a huge particles movement calculation like calculation distance between two particle, we need a square root etc. So If any suggestion it will be very helpful. I have tried and below is my code

#include <math.h>
#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

#define CHECK_RANGE 100

inline float msqrt(float a)
{
    int i;
    for (i = 0;i * i <= a;i  );
    
    float lb = i - 1; //lower bound
    if (lb * lb == a)
        return lb;
    float ub = lb   1; // upper bound
    float pub = ub; // previous upper bound
    for (int j = 0;j <= 20;j  )
    {
        float ub2 = ub * ub;
        if (ub2 > a)
        {
            pub = ub;
            ub = (lb   ub) / 2; // mid value of lower and upper bound
        }
        else
        {
            lb = ub; 
            ub = pub;
        }
    }
    return ub;
}

void check_msqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i  )
    {
        msqrt(i);
    }
}

void check_sqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i  )
    {
        sqrt(i);
    }
}

int main()
{
    auto start1 = high_resolution_clock::now();
    check_msqrt();
    auto stop1 = high_resolution_clock::now();

    auto duration1 = duration_cast<microseconds>(stop1 - start1);
    cout << "Time for check_msqrt = " << duration1.count() << " micro secs\n";


    auto start2 = high_resolution_clock::now();
    check_sqrt();
    auto stop2 = high_resolution_clock::now();

    auto duration2 = duration_cast<microseconds>(stop2 - start2);
    cout << "Time for check_sqrt = " << duration2.count() << " micro secs";
    
    //cout << msqrt(3);

    return 0;
}

output of above code showing the implemented method 4 times more slow than sqrt of math.h file. I need faster than math.h version.

CodePudding user response：

In short, I do not think it is possible to implement something generally faster than the standard library version of sqrt.

Performance is a very important parameter when implementing standard library functions and it is fair to assume that such a commonly used function as sqrt is optimized as much as possible.

Beating the standard library function would require a special case, such as:

Availability of a suitable assembler instruction - or other specialized hardware support - on the particular system for which the standard library has not been specialized.
Knowledge of the needed range or precision. The standard library function must handle all cases specified by the standard. If the application only needs a subset of that or maybe only requires an approximate result then perhaps an optimization is possible.
Making a mathematical reduction of the calculations or combine some calculation steps in a smart way so an efficient implementation can be made for that combination.

CodePudding user response：

I have also tried with the algorithm mention in https://en.wikipedia.org/wiki/Fast_inverse_square_root, but not found desired result, please check

#include <math.h>
#include <iostream>
#include <chrono>

#include <bit>
#include <limits>
#include <cstdint>

using namespace std;
using namespace std::chrono;

#define CHECK_RANGE 10000

inline float msqrt(float a)
{
    int i;
    for (i = 0;i * i <= a;i  );
    
    float lb = i - 1; //lower bound
    if (lb * lb == a)
        return lb;
    float ub = lb   1; // upper bound
    float pub = ub; // previous upper bound
    for (int j = 0;j <= 20;j  )
    {
        float ub2 = ub * ub;
        if (ub2 > a)
        {
            pub = ub;
            ub = (lb   ub) / 2; // mid value of lower and upper bound
        }
        else
        {
            lb = ub; 
            ub = pub;
        }
    }
    return ub;
}

/* mentioned here ->  https://en.wikipedia.org/wiki/Fast_inverse_square_root */
inline float Q_sqrt(float number)
{
    union Conv {
        float    f;
        uint32_t i;
    };
    Conv conv;
    conv.f= number;
    conv.i = 0x5f3759df - (conv.i >> 1);
    conv.f *= 1.5F - (number * 0.5F * conv.f * conv.f);
    return 1/conv.f;
}

void check_Qsqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i  )
    {
        Q_sqrt(i);
    }
}

void check_msqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i  )
    {
        msqrt(i);
    }
}

void check_sqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i  )
    {
        sqrt(i);
    }
}

int main()
{
    auto start1 = high_resolution_clock::now();
    check_msqrt();
    auto stop1 = high_resolution_clock::now();

    auto duration1 = duration_cast<microseconds>(stop1 - start1);
    cout << "Time for check_msqrt = " << duration1.count() << " micro secs\n";


    auto start2 = high_resolution_clock::now();
    check_sqrt();
    auto stop2 = high_resolution_clock::now();

    auto duration2 = duration_cast<microseconds>(stop2 - start2);
    cout << "Time for check_sqrt = " << duration2.count() << " micro secs\n";
    
    auto start3 = high_resolution_clock::now();
    check_Qsqrt();
    auto stop3 = high_resolution_clock::now();

    auto duration3 = duration_cast<microseconds>(stop3 - start3);
    cout << "Time for check_Qsqrt = " << duration3.count() << " micro secs\n";

    //cout << Q_sqrt(3);
    //cout << sqrt(3);
    //cout << msqrt(3);
    return 0;
}

CodePudding user response：

Here's another alternative to binary search. It may not be as fast as std::sqrt, haven't tested it. But it will definitely be faster than your binary search.

auto
Sqrt(float x)
{
    using F = decltype(x);
    if (x == 0 || x == INFINITY || isnan(x))
        return x;
    if (x < 0)
        return F{NAN};
    int e;
    x = std::frexp(x, &e);
    if (e % 2 != 0)
    {
          e;
        x /= 2;
    }
    auto y = (F{-160}/567*x   F{2'848}/2'835)*x   F{155}/567;
    y = (y   x/y)/2;
    y = (y   x/y)/2;
    return std::ldexp(y, e/2);
}

After getting /-0, nan, inf, and negatives out of the way, it works by decomposing the float into a mantissa in the range of [¹/₄, 1) times 2^e where e is an even integer. The answer is then sqrt(mantissa)* 2^{^e/₂}.

Finding the sqrt of the mantissa can be guessed at with a least squares quadratic curve fit in the range [¹/₄, 1]. Then that good guess is refined by two iterations of Newton–Raphson. This will get you within 1 ulp of the correctly rounded result. A good std::sqrt will typically get that last bit correct.