Home > other >  Why doesn't this cause overflow?
Why doesn't this cause overflow?

Time:03-13

#include <stdio.h>

int main() {
    double b = 3.14;
    double c = -1e20;
    
    c = -1e20   b;
    
    return 0;
}

As long as I know type "double" has 52 bits of fraction. To conform 3.14's exponent to -1e20, 3.14's faction part goes over 60 bits, which never fits to 52 bits. In my understanding, rest of fraction bits other than 52, which roughly counts 14 bits, invades unassigned memory space, like this. rough drawing

So I examined memory map in debug mode (gdb), suspecting that the bits next to the variable b or c would be corrupted. But I couldn't see any changes. What am I missing here?

CodePudding user response:

You mix up 2 very different things:

  • Buffer overflow/overrun

Your added image shows what happens when you overflow your buffer. Like definining char[100] and writing to index 150. Then the memory layout is important as you might corrupt neighboring variables.

  • Overflow in values of a data type

What your code shows can only be an overflow of values. If you do int a= INT_MAX; a you get an integer overflow. This only affects the resulting value. It does not cause the variable to grow in size. An int will always stay an int. You do not invade any memory area outside your data type. Depending on data type and architecture the overflowing bits could just be chopped off or some saturation could be applied to set the value to maximum/minimum representable value.

I did not check for yout values but without inspecting the value of c in a debugger or printing it, you cannot tell anything about the overflow there.

CodePudding user response:

Floating-point arithmetic is not defined to work by writing out all the bits of the operands, performing the arithmetic using all the bits involved, and storing those bits in memory. Rather, the way elementary floating-point operations work is that each operation is performed “as if it first produced an intermediate result correct to infinite precision and with unbounded range” and then rounded to a result that is representable in the floating-point format. That “as if” is important. It means that when computer processor designers are designing the floating-point arithmetic instructions, they figure out how to compute what the final rounded result would be. The processor does not always need to “write out” all the bits to do that.

Consider an example using decimal floating-point with four significant digits. If we add 6.543•1020 and 1.037•17 (equal to 0.001037•1020), the infinite-precision result would be 6.544037•1020, and then rounding that to the nearest number representable in the four-significant-digit format would give 6.544•1020. But we do not have to write out the infinite-precision result to compute that. We can compute the result is 6.544•1020 plus a tiny fraction, and then we can discard that fraction without actually writing out its digits. This is what processor designers do. The add, multiply, and other instructions compute the main part of a result, and they carefully manage information about the other parts to determine whether they would cause the result to round upward or downward in its last digit.

The resulting behavior is that, given any two operands in the format used for double, the computer always produces a result in that same format. It does not produce any extra bits.

Supplement

There are 53 bits in the fraction portion of the format commonly used for double. (This is the IEEE-754 binary64 format, also called double precision.) The fraction portion is called the significand. (You may see it referred to as a mantissa, but that is an old term for the fraction portion of a logarithm. The preferred term is “significand.” Significands are linear; mantissas are logarithmic.) You may see some people describe there being 52 bits for the significand, but that refers to a part of the encoding of the floating-point value, and it is only part of it.

Mathematically, a floating-point representation is defined to be sfbe, where b is a fixed numeric base, s provides a sign ( 1 or −1), f is a number with a fixed number of digits p in base b, and e is an exponent within fixed limits. p is called the precision of the format, and it is 53 for the binary64 format. When this number is encoded into bits, the last 52 bits of f are stored in the significand field, which is where the 52 comes from. However, the first bit is also encoded, by way of the exponent field. Whenever the stored exponent field is not zero (or the special value of all one bits), it means the first bit of f is 1. When the stored exponent field is zero, it means the first bit of f is 0. So there are 53 bits present in the encoding.

  • Related