Home > OS >  Is the modulus operator susceptible to floating point errors?
Is the modulus operator susceptible to floating point errors?

Time:09-26

I want to create a setter for a double variable num, but I would only like to update it if the input is a multiple of 0.5.

Here's what I have, but I'm worried about floating-point errors.

public void setNum(double num) {
    if (num % 0.5 == 0.0) {
        this.num = num;
    }
}

I assume that for some inputs that actually are a multiple of 0.5, it might return some 0.0000003 or 0.49999997, thus not 0.0.

What can I do to remedy this? Or is this not a problem in this case?

CodePudding user response:

Unless you're dealing with really big floating point numbers, you won't lose accuracy for something that actually is an exact multiple of 0.5, because 0.5 is exactly expressible in binary. But for a number that is close enough to a multiple of 0.5, you might find that (e.g.) 10.500000000000000001 has been stored as 10.5.

So (num % 0.5 == 0.0) will definitely be true if num is a multiple of 0.5, but it might also be true if num is a slightly inaccurate representation of a number that is close to a multiple of 0.5.

CodePudding user response:

Java’s % operator never introduces any rounding error because the result is always small enough that it is able to represent the exact remainder.

The Java Language Specification, Java SE 11 Edition, 15.7.3 defines % for cases not involving NaNs, infinities, or zeros:

In the remaining cases, where neither an infinity, nor a zero, nor NaN is involved, the floating-point remainder r from the division of a dividend n by a divisor d is defined by the mathematical relation r = n - (dq) where q is an integer that is negative only if n/d is negative and positive only if n/d is positive, and whose magnitude is as large as possible without exceeding the magnitude of the true mathematical quotient of n and d.

Thus the magnitude of r is not greater than the magnitude of n (because we subtract some dq from n that is smaller than n in magnitude and that is zero or has the same sign as n) and is less than the magnitude of d (because otherwise q could be one larger in magnitude). This means r is at least as fine as n and q—its exponent is at least as small as n’s exponent and as q’s exponent. And that means no significant bits in the binary representation of n - (dq) are below the position value of r’s lowest bit. Therefore, no significant bits were beyond the point where r had to be rounded. So nothing was lost in rounding. So r is an exact result.

  • Related