Home > Software engineering >  Float to Double adding many 0s at the end of the new double number during conversion
Float to Double adding many 0s at the end of the new double number during conversion

Time:07-12

I'm facing a little problem on a personal project:

When I'm converting a float number to a double to make operations ( -*/) easy, it adds a lot of 0s behind the default float number.

For example: float number = -4.1112 -> double number = -4.1112000000000002

I convert the float to a double with the standard function std::stod().

This issue is a big problem for me cause I'm checking for overflow in my project and it throws an exception because of this issue.

Here is the checkOverflow function that throws an exception:

{
    if (type == eOperandType::Int8) {
        if (value > std::numeric_limits<int8_t>::max() || value < std::numeric_limits<int8_t>::min())
            throw VMException("Overflow");
    } else if (type == eOperandType::Int16) {
        if (value > std::numeric_limits<int16_t>::max() || value < std::numeric_limits<int16_t>::min())
            throw VMException("Overflow");
    } else if (type == eOperandType::Int32) {
        if (value > std::numeric_limits<int32_t>::max() || value < std::numeric_limits<int32_t>::min())
            throw VMException("Overflow");
    } else if (type == eOperandType::Float) {
        if (value > std::numeric_limits<float>::max() || value < std::numeric_limits<float>::min())
            throw VMException("Overflow");
    } else if (type == eOperandType::Double) {
        if (value > std::numeric_limits<double>::max() || value < std::numeric_limits<double>::min())
            throw VMException("Overflow");
    }
}

CodePudding user response:

That's life I'm afraid. A floating point type can only represent a sparse subset of the real numbers.

Assuming IEEE754, the nearest float to -4.1112 is -4.111199855804443359375

The nearest double to -4.1112 is -4.111200000000000187583282240666449069976806640625

If you need perfect decimal precision then use a decimal type. There's one in the Boost library distribution. Boost also has a numeric_cast function, that does that your function is attempting to do; cleverly, with minimal run-time overhead.

CodePudding user response:

The problem you are having is completely different.

All your checks are wrong. Think about it: if a variable is of type, say, int32_t, its value is necessarily between the minimum and maximum possible values that can be represented by an int32_t, by definition. Let's simplify: it's like having a single-digit number, and testing that it is between 0 and 9 (if it is unsigned), or between -9 and 9 (if it is signed): how could such a test fail? Your checks should never raise an exception. But, as you say, they do. How is it even possible? And anyway, why would it happen for the long series of zeros that derive from representing -4.1112 as a floating point number, turning it into -4.1112000000000002? That isn't an overflow! This is a strong hint that your problem is elsewhere.

The solution is that std::numeric_limits<T>::min doesn't do what you think. As CPP Reference explains, it gives you the smallest positive value:

For floating-point types with denormalization, min returns the minimum positive normalized value. Note that this behavior may be unexpected, especially when compared to the behavior of min for integral types. To find the value that has no values less than it, use numeric_limits::lowest.

And the page about lowest also provides an example, comparing the output of min, lowest and max:

std::numeric_limits<T>::min():  
        float: 1.17549e-38 or 0x1p-126
        double: 2.22507e-308 or 0x1p-1022
std::numeric_limits<T>::lowest():  
        float: -3.40282e 38 or -0x1.fffffep 127
        double: -1.79769e 308 or -0x1.fffffffffffffp 1023
std::numeric_limits<T>::max():  
        float: 3.40282e 38 or 0x1.fffffep 127
        double: 1.79769e 308 or 0x1.fffffffffffffp 1023

And as you can see, min is positive. So the opposite of max is lowest.

So you are getting exceptions because your negative values are smaller than the smallest positive value. Or, in other words: because -4 is less than 0.0001. Which is correct. It's the test that is wrong!

You could fix that by using lowest... But then, what would your checks tell you? If they ever raised an exception, it would mean that the compiler and/or library that you are using are seriously broken. If that is what you are testing, ok. But honestly I think it will never happen, and you could just delete these tests, as they provide no real value.

  • Related