Consider the following piece of C Code:
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
cout.precision(1000000000);
float a,b,c;
a = 1;
b = -1;
c = pow(2, -50);
cout << "a = " << a << endl;
cout << "b = " << b << endl;
cout << "c = " << c << endl;
float ab = a b;
float bc = b c;
float abc = ab c;
float bca = bc a;
cout << "a b = " << ab << endl;
cout << "b c = " << bc << endl;
cout << "(a b) c = " << abc << endl;
cout << "(b c) a = " << bca << endl;
return 0;
}
Which yields the output:
a = 1
b = -1
c = 8.8817841970012523233890533447265625e-16
a b = 0
b c = -1
(a b) c = 8.8817841970012523233890533447265625e-16
(b c) a = 0
Why is b c = -1?
I am not getting my head around this effect of the IEEE 754 standard.
To my understanding the exponent ranges from -126 to 127. (8 bit for the biased exponent with a bias of 127.)
So 2^(-50) is representable without an issue as is 1 or -1. Neither of them are subnormal (denormalized) numbers, if I understand the standard correctly.
But why does the addition of -1 2^(-50) result in -1, thus the smaller number being neglected?
Thanks in advance for any help!
CodePudding user response:
The IEEE 754 standard specifies 1 sign bit, 7 exponent bits and 24 bits for the mantissa. When performing addition, the mantissas of each number get normalized, so 2^-50 is 1 shifted right by 50 bits relative to 1. This causes it to fall outside of the 24 bit mantissa used for the result. You should try repeating your experiment with 2^-25 to prove this.
CodePudding user response:
You are using float
which is (at least) single precision. Use double
instead.
And -1 9e-16
is within roundoff of -1
in single precision.