The Intel Intrinsics Guide says that the intrinsic _mm256_rsqrt_ps
has a relative error of at most 1.5*2^-12
. But, when I compare the result of _mm256_rsqrt_ps
to a standard C calculation of the inverse of the square root (1.0 / sqrt(x)
), I get a relative error much greater than 1.5*2^-12
.
I used the following program to test this:
#include <immintrin.h>
#include <iostream>
#include <math.h>
void test(float x) {
float resP = _mm256_cvtss_f32(_mm256_rsqrt_ps(_mm256_set1_ps(x)));
float res = 1.0 / sqrt(x);
float relErr = fabs(resP - res) / res;
std::cout << "x = " << x << std::endl;
std::cout << "resP = " << resP << std::endl;
std::cout << "res = " << res << std::endl;
std::cout << "relErr = " << relErr << std::endl;
}
int main() {
test(1e30);
test(1e-30);
test(1e17);
test(1e-17);
}
It outputs the following:
x = 1e 30
resP = 1.00007e-15
res = 1e-15
relErr = 6.80803e-05
x = 1e-30
resP = 9.99868e 14
res = 1e 15
relErr = 0.0001316
x = 1e 17
resP = 3.16186e-09
res = 3.16228e-09
relErr = 0.000132569
x = 1e-17
resP = 3.16277e 08
res = 3.16228e 08
relErr = 0.000154825
As you can see, the relative error is significantly greater than 1.5*2^-12
.
The instruction _mm256_rcp_ps
also seems to have a much greater relative error than the intrinsics guide says it has.
Am I doing something wrong? Am I misunderstanding the intrinsics guide? Or is the intrinsics guide wrong?
CodePudding user response:
Your relative error is in the bound.
1.5*2^-12 = 0.000366
It's just powers of 2 not powers of 10.
Also is does not claim to have this relative error in comparison to the single precision 1/sqrt(x) but to the exact result.