Home > Software design >  Portable way to check whether a floating point division would end in -inf
Portable way to check whether a floating point division would end in -inf

Time:09-25

I have a floating point division of the form 1.0f / x with x as a float. How would I check beforehand whether the x is so close to 0.0f that the result would be -inf / undefined? I'm not sure if the epsilon from std limits is enough.

Regards.

CodePudding user response:

We could search for the limit using trial and error:

#include <iostream>
#include <limits>

#include <cmath>

int main() {
    float limit = 0.0f;
    float result = 1.0f / limit;
    while (
        result == std::numeric_limits<float>::infinity()
        or std::isnan(result)
    ) {
        limit = std::nextafter(limit, 1.0f);
        result = 1.0f / limit;
    }
    std::cout << "Limit = " << limit << std::endl;
    std::cout << "1.0f / Limit = " << 1.0f / limit << std::endl;
}

This outputs on my system:

Limit = 2.93874e-39
1.0f / Limit = 3.40282e 38

This is however, not a very efficient solution. If we could make this algorithm constexpr, this would mitigate the issue but unfortunately std::nextafter() isn't constexpr.

If you know your environment is using IEEE-754 airthmetic, these limits are probably constant, but as you ask for portability, we can't always assume that.

CodePudding user response:

Prerequisites

C does not mandate IEEE-754 or a particular rounding method. For this answer, I will assume IEEE-754 is used with a binary format and round-to-nearest, ties-to-even.

Conclusion

1/x overflows iff fabs(x) <= std::ldexp(1, -std::numeric_limits<float>::max_exponent). For a constant expression, you can use std::numeric_limits<float>::min()/4.

Discussion

Rounding at the end of the finite range is performed as if the exponent kept going. E.g., using decimal for illustration, if the highest representable finite number were 9.99•1017, the next representable number, if the exponent were not limited, would be 1.00•1018. The midpoint between those two is 9.995•1017, so numbers under that are rounded down and numbers above that are rounded up. With ties-to-even, 9.995•1017 is rounded up.

For a binary format, the greatest representable value is (2−ε)•2q, where ε is the “machine epsilon” (the ULP of 1, so 2-ε is the greatest representable significand) and q is the maximum exponent. Then the pointer where rounding occurs is (2−½ε)•2q.

If 1/x < (2−½ε)•2q, the result is rounded downward. Otherwise, it is rounded upward, to ∞. Thus, the result is less than ∞ iff x > 1/((2−½ε)•2q) = 2q/(2-½ε).

1/(2-½ε) is slightly greater than ½, by less than ½ε, so the greatest representable value less than or equal to it is ½. Thus, the result of 1/x is less than ∞ iff x > 2q/2 = 2q−1.

C tells us the maximum exponent with std::numeric_limits<double>::max_exponent (defined in the header <limits>). However, C calibrates this exponent for a significand range of [½, 1) instead of IEEE-754’s [1, 2), so it is one greater than q. Thus the −q−1 we want is simply -std::numeric_limits<double>::max_exponent.

We can calculate 2q−1 with the ldexp function (declared in <cmath>): std::ldexp(1, -std::numeric_limits<float>::max_exponent).

With Apple Clang 11, this program:

#include <cmath>
#include <iomanip>
#include <iostream>
#include <limits>


int main(void)
{
    float x = std::ldexp(1, -std::numeric_limits<float>::max_exponent);

    std::cout << std::setprecision(20) << x << " is too small, result will overflow:\n";
    std::cout << "\t" << 1/x << ".\n";

    x = std::nexttoward(x, INFINITY);

    std::cout << std::setprecision(20) << x << " is just big enough, result will not overflow:\n";
    std::cout << "\t" << 1/x << ".\n";
}

produces:

2.9387358770557187699e-39 is too small, result will overflow:
    inf.
2.9387372783541830947e-39 is just big enough, result will not overflow:
    3.4028220466166163425e 38.

Accounting for negative numbers as well, 1/x overflows iff fabs(x) <= std::ldexp(1, -std::numeric_limits<float>::max_exponent).

Because of the way IEEE-754 specifies the exponent range, std::ldexp(1, -std::numeric_limits<float>::max_exponent) equals std::numeric_limits<float>::min()/4. (IEEE-754 specifies the minimum normal exponent to be 1−q, so the −q−1 we desire is (1-q)-2.)

  • Related