I'm trying to ensure the sum of a list of floats/ints is always equal to 1 where the individual list members cannot equal 0 or 1. To do this, I first use numpy clip to prevent the existence of 0 and 1s and then I check that the sum of the list isclose to 1 within the tolerance of my clip. This works in some cases but not in others. Am I missing something obvious here or is this down to floating point issues?
In [1]: import numpy as np
In [2]: ag = [0,1,0]
In [3]: np.clip(ag, 1e-8, 1 - 1e-8)
Out[3]: array([1.0000000e-08, 9.9999999e-01, 1.0000000e-08])
In [4]: np.clip(ag, 1e-8, 1 - 1e-8).sum()
Out[4]: 1.00000001
In [5]: np.isclose(np.clip([0,1,0], 1e-8, 1 - 1e-8).sum(), 1.0, rtol=0, atol=1e-8)
Out[5]: True
In [6]: np.isclose(np.clip([0,1,], 1e-8, 1 - 1e-8).sum(), 1.0, rtol=0, atol=1e-8)
Out[6]: True
In [7]: np.isclose(np.clip([1], 1e-8, 1 - 1e-8).sum(), 1.0, rtol=0, atol=1e-8)
Out[7]: False
CodePudding user response:
You use np.clip()
and np.isclose()
very effectively. And no, you are not missing something obvious. However, with the limitations of floating point arithmetic producing results so close to your chosen tolerance value 1e-8
, you encounter something non-obvious. This is based on the fact that the granularity with which you clip and the tolerance by which you compare are both the exact same. Uncovering the source of this error will require a close inspection of np.isclose()
.
Let's refer symbolically to ε as the absolute tolerance, i.e. ε=1e-8
. It is clear that the 2nd input [0,1]
is not causing any problem since you clip it to [ε, 1-ε]
which has a sum of 1.0
and therefore an absolute difference of 0
. The errors through clipping cancel out for this input. This can be verified by
np.clip([0,1,], 1e-8, 1 - 1e-8).sum()
>1.0
However, the 1st and the 3rd inputs are different, as the errors do not cancel out.
First, the input [0,1,0]
will be clipped to [ε, 1-ε, ε]
which sums to 1 ε
. Therefore, the absolute difference is (at least symbolically) abs(1 ε-1)=abs(ε)=ε=1e-8
. This does not quite coincide with the arithmetic output which is
np.clip([0,1,0], 1e-8, 1 - 1e-8).sum() - 1
>9.99999993922529e-09
The (symbolic) result lands right on the "edge" of your tolerance atol=1e-8
. Since np.isclose()
internally checks the condition
absolute(a - b) <= (atol rtol * absolute(b))
with atol=1e-08
, rtol=0
, and b=1
in your case.
Because 9.99999993922529e-09 < 1e-08 <= 1e-8
is true, so is the output of
np.isclose(np.clip([0,1,0], 1e-8, 1 - 1e-8).sum(), 1.0, rtol=0, atol=1e-8)
This leaves us with the mysterious 3rd case. Symbolically, [1]
will be clipped to [1-ε]
- trivially yielding the sum 1-ε
. Obviously, we would expect a similar outcome as for the first input since, symbolically, abs(1-ε-1)=abs(-ε)=ε
. Regardless, Python does not perform symbolic but arithmetic operations. Plugging in the input of case 3, we obtain
abs(0.99999999 - 1) < 1e-08
>(False
Taking a closer look, we see that
abs(0.99999999 - 1)
> 1.0000000050247593e-08
A floating-point error struck since we should observe exactly 1-e08
. Regardless, since 1.0000000050247593e-08 > 1e-08
holds, the function np.close()
returns false
.
Solution
You could either reduce the tolerance of the numerical comparison (e.g. by setting atol=1e-7
) or reduce the size of the clipping constant ε (e.g. by setting np.clip(..., 1e-9, 1 - 1e-9)
).
Non-Solution
Switching from np.isclose()
to math.isclose()
is futile as they return the same output for your three sample inputs.