Home > Software design >  Fastest way to get square root in float value
Fastest way to get square root in float value

Time:03-04

I am trying to find a fastest way to make square root of any float number in C . I am using this type of function in a huge particles movement calculation like calculation distance between two particle, we need a square root etc. So If any suggestion it will be very helpful. I have tried and below is my code

#include <math.h>
#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

#define CHECK_RANGE 100

inline float msqrt(float a)
{
    int i;
    for (i = 0;i * i <= a;i  );
    
    float lb = i - 1; //lower bound
    if (lb * lb == a)
        return lb;
    float ub = lb   1; // upper bound
    float pub = ub; // previous upper bound
    for (int j = 0;j <= 20;j  )
    {
        float ub2 = ub * ub;
        if (ub2 > a)
        {
            pub = ub;
            ub = (lb   ub) / 2; // mid value of lower and upper bound
        }
        else
        {
            lb = ub; 
            ub = pub;
        }
    }
    return ub;
}

void check_msqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i  )
    {
        msqrt(i);
    }
}

void check_sqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i  )
    {
        sqrt(i);
    }
}

int main()
{
    auto start1 = high_resolution_clock::now();
    check_msqrt();
    auto stop1 = high_resolution_clock::now();

    auto duration1 = duration_cast<microseconds>(stop1 - start1);
    cout << "Time for check_msqrt = " << duration1.count() << " micro secs\n";


    auto start2 = high_resolution_clock::now();
    check_sqrt();
    auto stop2 = high_resolution_clock::now();

    auto duration2 = duration_cast<microseconds>(stop2 - start2);
    cout << "Time for check_sqrt = " << duration2.count() << " micro secs";
    
    //cout << msqrt(3);

    return 0;
}

output of above code showing the implemented method 4 times more slow than sqrt of math.h file. I need faster than math.h version. enter image description here

CodePudding user response:

In short, I do not think it is possible to implement something generally faster than the standard library version of sqrt.

Performance is a very important parameter when implementing standard library functions and it is fair to assume that such a commonly used function as sqrt is optimized as much as possible.

Beating the standard library function would require a special case, such as:

  • Availability of a suitable assembler instruction - or other specialized hardware support - on the particular system for which the standard library has not been specialized.
  • Knowledge of the needed range or precision. The standard library function must handle all cases specified by the standard. If the application only needs a subset of that or maybe only requires an approximate result then perhaps an optimization is possible.
  • Making a mathematical reduction of the calculations or combine some calculation steps in a smart way so an efficient implementation can be made for that combination.

CodePudding user response:

I have also tried with the algorithm mention in https://en.wikipedia.org/wiki/Fast_inverse_square_root, but not found desired result, please check

#include <math.h>
#include <iostream>
#include <chrono>

#include <bit>
#include <limits>
#include <cstdint>

using namespace std;
using namespace std::chrono;

#define CHECK_RANGE 10000

inline float msqrt(float a)
{
    int i;
    for (i = 0;i * i <= a;i  );
    
    float lb = i - 1; //lower bound
    if (lb * lb == a)
        return lb;
    float ub = lb   1; // upper bound
    float pub = ub; // previous upper bound
    for (int j = 0;j <= 20;j  )
    {
        float ub2 = ub * ub;
        if (ub2 > a)
        {
            pub = ub;
            ub = (lb   ub) / 2; // mid value of lower and upper bound
        }
        else
        {
            lb = ub; 
            ub = pub;
        }
    }
    return ub;
}

/* mentioned here ->  https://en.wikipedia.org/wiki/Fast_inverse_square_root */
inline float Q_sqrt(float number)
{
    union Conv {
        float    f;
        uint32_t i;
    };
    Conv conv;
    conv.f= number;
    conv.i = 0x5f3759df - (conv.i >> 1);
    conv.f *= 1.5F - (number * 0.5F * conv.f * conv.f);
    return 1/conv.f;
}

void check_Qsqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i  )
    {
        Q_sqrt(i);
    }
}

void check_msqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i  )
    {
        msqrt(i);
    }
}

void check_sqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i  )
    {
        sqrt(i);
    }
}

int main()
{
    auto start1 = high_resolution_clock::now();
    check_msqrt();
    auto stop1 = high_resolution_clock::now();

    auto duration1 = duration_cast<microseconds>(stop1 - start1);
    cout << "Time for check_msqrt = " << duration1.count() << " micro secs\n";


    auto start2 = high_resolution_clock::now();
    check_sqrt();
    auto stop2 = high_resolution_clock::now();

    auto duration2 = duration_cast<microseconds>(stop2 - start2);
    cout << "Time for check_sqrt = " << duration2.count() << " micro secs\n";
    
    auto start3 = high_resolution_clock::now();
    check_Qsqrt();
    auto stop3 = high_resolution_clock::now();

    auto duration3 = duration_cast<microseconds>(stop3 - start3);
    cout << "Time for check_Qsqrt = " << duration3.count() << " micro secs\n";

    //cout << Q_sqrt(3);
    //cout << sqrt(3);
    //cout << msqrt(3);
    return 0;
}

CodePudding user response:

Here's another alternative to binary search. It may not be as fast as std::sqrt, haven't tested it. But it will definitely be faster than your binary search.

auto
Sqrt(float x)
{
    using F = decltype(x);
    if (x == 0 || x == INFINITY || isnan(x))
        return x;
    if (x < 0)
        return F{NAN};
    int e;
    x = std::frexp(x, &e);
    if (e % 2 != 0)
    {
          e;
        x /= 2;
    }
    auto y = (F{-160}/567*x   F{2'848}/2'835)*x   F{155}/567;
    y = (y   x/y)/2;
    y = (y   x/y)/2;
    return std::ldexp(y, e/2);
}

After getting /-0, nan, inf, and negatives out of the way, it works by decomposing the float into a mantissa in the range of [1/4, 1) times 2e where e is an even integer. The answer is then sqrt(mantissa)* 2e/2.

Finding the sqrt of the mantissa can be guessed at with a least squares quadratic curve fit in the range [1/4, 1]. Then that good guess is refined by two iterations of Newton–Raphson. This will get you within 1 ulp of the correctly rounded result. A good std::sqrt will typically get that last bit correct.

  • Related