Given the sequence
1/1, 1/2, 1/3, ... , 1/n
How can I calculate at what point I will not be able to distinguish with precision E between two consecutive elements 1/i and 1/i 1 if I use numpy.float16 ? i.e. what is 'i' ?
What about the other np-floats ?
What is the smallest E ? and how to calculate 'i' for it ?
For example if E = 0.01 I can distinguish between 1/9 and 1/10, but not between 1/10 and 1/11, because :
1/9 = 0.111
1/10 = 0.100
1/11 = 0.091
0.111 - 0.100 = 0.01 >= E
0.100 - 0.091 = 0.009 < E
i = 10
In more abstract way, given f(i) what is the maximum 'i' representable in np.floatXX ?
Interestingly the precision in practice is worse than calculated : /the place where logic breaks/
for i in range(int(1e3),int(12e6)) :
if not np.floatXX(1/i) > np.floatXX(1/(i 1)) :
print(i); break
float32: 11864338
float16: 1464
CodePudding user response:
Instead of adding 1, double the denominator. You can safely assume that it's some binary number. Here is one simple method:
one = np.float64(1.0)
two = np.float64(2.0)
n = one
bits = 0
while one n != one:
bits = 1
n /= two
You start with bits = 0
because otherwise you would get the count of bits that took you past resolution.
In the end, you get bits = 53
, which is the number of bits in an IEEE-754 encoded 64-bit floating point number.
That means that for any number, which is encoded in what is effectively binary scientific notation, the ULP (unit of least precision) is approximately n * 2**-53
. Specifically, where n
is the number rounded to its highest bit. You won't be able to resolve smaller relative changes in a float.
Bonus: Running the above code for the other floating point types gives:
float16 (half): 11 bits
float32 (single): 24 bits
float64 (double): 53 bits
float96 (sometimes longdouble): 80 bits
float128 (when available): 113 bits
You can modify the code above to work for any target number:
target = np.float16(0.0004883)
one = np.float16(1.0)
two = np.float16(2.0)
n = two**(np.floor(np.log2(target)) - one)
bits = 0
while target n != target:
bits = 1
n /= two
The result (ULP) is given by n * 2
since the loop stops after you lose resolution. This is the same reason we start with bits = 0
. In this case:
>>> n * two
5e-07
You can entirely short circuit the computation if you know the number of bits in the mantissa up front. So for float16
, where bits = 11
, you can do
>>> two**(np.floor(np.log2(target)) - np.float16(bits))
5e-07
Read more here:
CodePudding user response:
My other answer provides the theory behind what you are actually asking, but needs some non-trivial interpretation. Here is the missing step:
Given some integer i
, you can write
1 / i - 1 / (i 1) =
(i 1 - i) / (i * (i 1)) =
1 / (i * (i 1)) =
1 / (i**2 i)
To find an i
such that 1 / (i**2 i)
falls below ULP of 1 / i
in some binary representation, you can use my other answer pretty directly.
The ULP of 1 / i
is given by
ulp = 2**(floor(log2(1 / i)) - (bits 1))
You can try to find an i
such that
1 / (i**2 i) < 2**(floor(log2(1 / i)) - (bits 1))
1 / (i**2 i) < 2**floor(log2(1 / i)) / 2**(bits 1)
2**(bits 1) < (i**2 i) * 2**floor(log2(1 / i))
It's not trivial as written because of the floor
operation, and Wolfram Alpha runs out of time. Since I am cheap and don't want to buy Mathematica, let's just approximate:
2**(bits 1) < (i**2 i) * 2**floor(log2(1 / i))
2**(bits 1) < (i**2 i) / i
2**(bits 1) < i 1
You may be off by one or so, but you should see that around i = 2**(bits 1) - 1
, the difference stops being resolvable. And indeed for the 11 bit mantissa of float16
, we see:
>>> np.float16(1 / (2**12 - 1)) - np.float16(1 / (2**12))
0.0
The actual number is a tiny bit less here (remember that approximation where we took away the floor
):
>>> np.float16(1 / (2**12 - 5)) - np.float16(1 / (2**12 - 4))
0.0
>>> np.float16(1 / (2**12 - 6)) - np.float16(1 / (2**12 - 5))
2.4e-07
As you noted in your comments, i
is
>>> 2**12 - 6
4090
You can compute the exact values for all the other floating point types in a similar manner. But that is indeed left as an exercise for the reader.