I was tortured by the floating point comparison issue in Matlab in the previous a couple of weeks.
My code involves tons of a<b
or c<=d
.
What drives me crazy is that
dw = 0.001;
W1 = [0:dw:1];
W2 = [0:dw:1];
Then we have
>>W1(418) = 0.417000000000000
>>W2(418) = 0.417000000000000
>>W1(418)>=W2(418)
ans =
logical
0
>>W2(418)>=W1(418)
ans =
logical
1
My current way of dealing with this issue is that I define an error term eps0 = 10^(-15)
. Then, whenever I am expecting to have a<=b
, I do a<=b eps0
.
But I was wondering if there is a more generic way of solving this problem? I really appreciate it!
CodePudding user response:
No, there is no generic way to solve floating-point number comparison, each situation is different, and the expected rounding error will change with the situation.
Instead of using eps0 = 10^(-15)
, use eps(b)
: a<=b eps(b)
. b eps(b)
is the next larger value that can be represented (assuming b
is positive). This should take care of one form of rounding error. If rounding errors accumulate, you need a larger margin: a<=b 10*eps(b)
, for example.
CodePudding user response:
This is what you told us you did, and here things work out as expected in my version of R2018a:
>> version
ans =
'9.4.0.813654 (R2018a)'
>> dw = 0.001;
>> W1 = [0:dw:1];
>> W2 = [0:dw:1];
>> W1(418)>=W2(418)
ans =
logical
1
>> W1(418) = 0.417000000000000;
>> W2(418) = 0.417000000000000;
>> W1(418)>=W2(418)
ans =
logical
1
>> W2(418)>=W1(418)
ans =
logical
1
So I strongly suspect there is something else you haven't told us about. One way I can see getting your result is if the underlying vectors are not the same type, so that the assignment of 0.417 gets converted differently. E.g., suppose W1 was in fact a single type instead of a double type. Then you would get this result:
>> dw = 0.001;
>> W1 = single([0:dw:1]);
>> W2 = [0:dw:1];
>> W1(418) = 0.417000000000000; % converted to closest SINGLE PRECISION bit pattern
>> W2(418) = 0.417000000000000; % converted to closest DOUBLE PRECISION bit pattern
>> W1(418)>=W2(418)
ans =
logical
0
>> W2(418)>=W1(418)
ans =
logical
1
>> sprintf('%.60f\n',W1(418))
ans =
'0.416999995708465576171875000000000000000000000000000000000000
'
>> sprintf('%.60f\n',W2(418))
ans =
'0.416999999999999981792342396147432737052440643310546875000000
'
So here the two numbers W1(418) and W2(418) are in fact different because one is stored as a single precision floating point and the other is stored as a double precision floating point. And the conversion of 0.417 into the binary IEEE floating point bit patterns differs between the two. The single bit pattern turns out to be slightly less in value that the double bit pattern, and the result is perfectly explainable.
So ... are your W1 and W2 vectors actually different types? I suspect they are, which calls into question all of your comparison logic. And as others have pointed out already, floating point arithmetic results in general have to be taken with a grain of salt and comparisons made very carefully (with tolerances, etc.).