Strange behaviour when combining numpy clip with numpy isclose-CodePudding

I'm trying to ensure the sum of a list of floats/ints is always equal to 1 where the individual list members cannot equal 0 or 1. To do this, I first use numpy clip to prevent the existence of 0 and 1s and then I check that the sum of the list isclose to 1 within the tolerance of my clip. This works in some cases but not in others. Am I missing something obvious here or is this down to floating point issues?

In [1]: import numpy as np

In [2]: ag = [0,1,0]

In [3]: np.clip(ag, 1e-8, 1 - 1e-8)
Out[3]: array([1.0000000e-08, 9.9999999e-01, 1.0000000e-08])

In [4]: np.clip(ag, 1e-8, 1 - 1e-8).sum()
Out[4]: 1.00000001

In [5]: np.isclose(np.clip([0,1,0], 1e-8, 1 - 1e-8).sum(), 1.0, rtol=0, atol=1e-8)
Out[5]: True

In [6]: np.isclose(np.clip([0,1,], 1e-8, 1 - 1e-8).sum(), 1.0, rtol=0, atol=1e-8)
Out[6]: True

In [7]: np.isclose(np.clip([1], 1e-8, 1 - 1e-8).sum(), 1.0, rtol=0, atol=1e-8)
Out[7]: False

CodePudding user response：

You use np.clip() and np.isclose() very effectively. And no, you are not missing something obvious. However, with the limitations of floating point arithmetic producing results so close to your chosen tolerance value 1e-8, you encounter something non-obvious. This is based on the fact that the granularity with which you clip and the tolerance by which you compare are both the exact same. Uncovering the source of this error will require a close inspection of np.isclose().

Let's refer symbolically to ε as the absolute tolerance, i.e. ε=1e-8. It is clear that the 2nd input [0,1] is not causing any problem since you clip it to [ε, 1-ε] which has a sum of 1.0 and therefore an absolute difference of 0. The errors through clipping cancel out for this input. This can be verified by

np.clip([0,1,], 1e-8, 1 - 1e-8).sum()
>1.0

However, the 1st and the 3rd inputs are different, as the errors do not cancel out.

First, the input [0,1,0] will be clipped to [ε, 1-ε, ε] which sums to 1 ε. Therefore, the absolute difference is (at least symbolically) abs(1 ε-1)=abs(ε)=ε=1e-8. This does not quite coincide with the arithmetic output which is

np.clip([0,1,0], 1e-8, 1 - 1e-8).sum() - 1
>9.99999993922529e-09

The (symbolic) result lands right on the "edge" of your tolerance atol=1e-8. Since np.isclose() internally checks the condition

absolute(a - b) <= (atol   rtol * absolute(b))

with atol=1e-08, rtol=0, and b=1 in your case. Because 9.99999993922529e-09 < 1e-08 <= 1e-8 is true, so is the output of

np.isclose(np.clip([0,1,0], 1e-8, 1 - 1e-8).sum(), 1.0, rtol=0, atol=1e-8)

This leaves us with the mysterious 3rd case. Symbolically, [1] will be clipped to [1-ε] - trivially yielding the sum 1-ε. Obviously, we would expect a similar outcome as for the first input since, symbolically, abs(1-ε-1)=abs(-ε)=ε. Regardless, Python does not perform symbolic but arithmetic operations. Plugging in the input of case 3, we obtain

abs(0.99999999 - 1) < 1e-08
>(False

Taking a closer look, we see that

abs(0.99999999 - 1)
> 1.0000000050247593e-08

A floating-point error struck since we should observe exactly 1-e08. Regardless, since 1.0000000050247593e-08 > 1e-08 holds, the function np.close() returns false.

Solution

You could either reduce the tolerance of the numerical comparison (e.g. by setting atol=1e-7) or reduce the size of the clipping constant ε (e.g. by setting np.clip(..., 1e-9, 1 - 1e-9)).

Non-Solution

Switching from np.isclose() to math.isclose() is futile as they return the same output for your three sample inputs.