I am trying to find a fastest way to make square root of any float number in C . I am using this type of function in a huge particles movement calculation like calculation distance between two particle, we need a square root etc. So If any suggestion it will be very helpful. I have tried and below is my code
#include <math.h>
#include <iostream>
#include <chrono>
using namespace std;
using namespace std::chrono;
#define CHECK_RANGE 100
inline float msqrt(float a)
{
int i;
for (i = 0;i * i <= a;i );
float lb = i - 1; //lower bound
if (lb * lb == a)
return lb;
float ub = lb 1; // upper bound
float pub = ub; // previous upper bound
for (int j = 0;j <= 20;j )
{
float ub2 = ub * ub;
if (ub2 > a)
{
pub = ub;
ub = (lb ub) / 2; // mid value of lower and upper bound
}
else
{
lb = ub;
ub = pub;
}
}
return ub;
}
void check_msqrt()
{
for (size_t i = 0; i < CHECK_RANGE; i )
{
msqrt(i);
}
}
void check_sqrt()
{
for (size_t i = 0; i < CHECK_RANGE; i )
{
sqrt(i);
}
}
int main()
{
auto start1 = high_resolution_clock::now();
check_msqrt();
auto stop1 = high_resolution_clock::now();
auto duration1 = duration_cast<microseconds>(stop1 - start1);
cout << "Time for check_msqrt = " << duration1.count() << " micro secs\n";
auto start2 = high_resolution_clock::now();
check_sqrt();
auto stop2 = high_resolution_clock::now();
auto duration2 = duration_cast<microseconds>(stop2 - start2);
cout << "Time for check_sqrt = " << duration2.count() << " micro secs";
//cout << msqrt(3);
return 0;
}
output of above code showing the implemented method 4 times more slow than sqrt of math.h file. I need faster than math.h version.
CodePudding user response:
In short, I do not think it is possible to implement something generally faster than the standard library version of sqrt
.
Performance is a very important parameter when implementing standard library functions and it is fair to assume that such a commonly used function as sqrt
is optimized as much as possible.
Beating the standard library function would require a special case, such as:
- Availability of a suitable assembler instruction - or other specialized hardware support - on the particular system for which the standard library has not been specialized.
- Knowledge of the needed range or precision. The standard library function must handle all cases specified by the standard. If the application only needs a subset of that or maybe only requires an approximate result then perhaps an optimization is possible.
- Making a mathematical reduction of the calculations or combine some calculation steps in a smart way so an efficient implementation can be made for that combination.
CodePudding user response:
I have also tried with the algorithm mention in https://en.wikipedia.org/wiki/Fast_inverse_square_root, but not found desired result, please check
#include <math.h>
#include <iostream>
#include <chrono>
#include <bit>
#include <limits>
#include <cstdint>
using namespace std;
using namespace std::chrono;
#define CHECK_RANGE 10000
inline float msqrt(float a)
{
int i;
for (i = 0;i * i <= a;i );
float lb = i - 1; //lower bound
if (lb * lb == a)
return lb;
float ub = lb 1; // upper bound
float pub = ub; // previous upper bound
for (int j = 0;j <= 20;j )
{
float ub2 = ub * ub;
if (ub2 > a)
{
pub = ub;
ub = (lb ub) / 2; // mid value of lower and upper bound
}
else
{
lb = ub;
ub = pub;
}
}
return ub;
}
/* mentioned here -> https://en.wikipedia.org/wiki/Fast_inverse_square_root */
inline float Q_sqrt(float number)
{
union Conv {
float f;
uint32_t i;
};
Conv conv;
conv.f= number;
conv.i = 0x5f3759df - (conv.i >> 1);
conv.f *= 1.5F - (number * 0.5F * conv.f * conv.f);
return 1/conv.f;
}
void check_Qsqrt()
{
for (size_t i = 0; i < CHECK_RANGE; i )
{
Q_sqrt(i);
}
}
void check_msqrt()
{
for (size_t i = 0; i < CHECK_RANGE; i )
{
msqrt(i);
}
}
void check_sqrt()
{
for (size_t i = 0; i < CHECK_RANGE; i )
{
sqrt(i);
}
}
int main()
{
auto start1 = high_resolution_clock::now();
check_msqrt();
auto stop1 = high_resolution_clock::now();
auto duration1 = duration_cast<microseconds>(stop1 - start1);
cout << "Time for check_msqrt = " << duration1.count() << " micro secs\n";
auto start2 = high_resolution_clock::now();
check_sqrt();
auto stop2 = high_resolution_clock::now();
auto duration2 = duration_cast<microseconds>(stop2 - start2);
cout << "Time for check_sqrt = " << duration2.count() << " micro secs\n";
auto start3 = high_resolution_clock::now();
check_Qsqrt();
auto stop3 = high_resolution_clock::now();
auto duration3 = duration_cast<microseconds>(stop3 - start3);
cout << "Time for check_Qsqrt = " << duration3.count() << " micro secs\n";
//cout << Q_sqrt(3);
//cout << sqrt(3);
//cout << msqrt(3);
return 0;
}
CodePudding user response:
Here's another alternative to binary search. It may not be as fast as std::sqrt
, haven't tested it. But it will definitely be faster than your binary search.
auto
Sqrt(float x)
{
using F = decltype(x);
if (x == 0 || x == INFINITY || isnan(x))
return x;
if (x < 0)
return F{NAN};
int e;
x = std::frexp(x, &e);
if (e % 2 != 0)
{
e;
x /= 2;
}
auto y = (F{-160}/567*x F{2'848}/2'835)*x F{155}/567;
y = (y x/y)/2;
y = (y x/y)/2;
return std::ldexp(y, e/2);
}
After getting /-0, nan, inf, and negatives out of the way, it works by decomposing the float
into a mantissa in the range of [1/4, 1) times 2e where e
is an even integer. The answer is then sqrt(mantissa)* 2e/2.
Finding the sqrt of the mantissa can be guessed at with a least squares quadratic curve fit in the range [1/4, 1]. Then that good guess is refined by two iterations of Newton–Raphson. This will get you within 1 ulp of the correctly rounded result. A good std::sqrt
will typically get that last bit correct.