Are doubles guaranteed to be able to hold the largest int?-CodePudding

In chapter 3.4.2 of The C Programming Language, Bjarne Stroustrup says The point of adding ints in a double would be to gracefully handle a number larger than the largest int. Are doubles guaranteed to be able to hold the largest ints?

CodePudding user response：

It's not guaranteed.

If you assume double is in fact implemented as a IEEE 754 binary64 type (it should be), then it has significand precision of 53 bits (that's the number of bits of integer precision it provides). Once you exceed 53 bits though, you'll start losing data (initially, it can only represent every other integer value, then every fourth value, then every eighth, etc., as it relies more and more on the exponent to scale it).

On most systems, int is 32 bits or less, so a single int addition can't exceed the representational ability of the double. But there are systems in which int is 64 bits, and on those systems, even without addition getting involved, a large int value can overflow the representational precision of a double; you'll get something close to the right value, but it won't be exactly correct.

In practice, when this situation arises, you probably want to use int64_t or the like; double will be more portable (there's no guarantee a given system implements a 64 bit integer type), but it may be slower (on systems without a floating point coprocessor) and it will be inherently less precise than a true 64 bit integer type.

I suspect Bjarne Stroustrup's comment dates back to the days when virtually all systems had native integer handling of 32 bits or fewer, so:

Not all of them provided a 64 bit integer type at all, and
When they did provide a 64 bit integer type, it was implemented in software by the compiler performing several 32 bit operations on paired 32 bit values to produce the equivalent of a 64 bit operation, making it much slower than a single floating point operation (assuming the system had a floating point coprocessor)

That sort of system still exists today mostly in the embedded development space, but for general purpose computers, it's pretty darn rare.

Alternatively, the computation in question may be one for which the result is likely to be huge (well beyond what even a 64 bit integer can hold) and some loss of precision is tolerated; an IEEE 754 binary64 type can technically represent values as high as 2 ** 1023 (the gaps between representable values just get nuts at that point), and could usefully store the result of summing a bunch of "large enough to not get lost due to precision loss (variable definition, depending on magnitude of result)" 32 bit integers up into the high two digit or low three digit bit counts.

CodePudding user response：

Are doubles guaranteed to be able to hold the largest ints?

No, primarily because the sizes and particular features of double and int are not guaranteed by the C standard.

The format commonly used for double is IEEE-754 “double precision,” also called binary64. The set of finite numbers this format represents is { M•2^e for integers M and e such that −2⁵³ < M < 2⁵³ and −1074 ≤ e ≤ 971 }. The largest set of consecutive integers in this set is the integers from −2⁵³ to 2⁵³, inclusive. 2⁵³ 1 is not representable in this format.

Therefore, if int is 54 bits or fewer, so it has one sign bit and 53 or fewer value bits, every int value can be represented as a double in this format. If int is wider than 54 bits, it can represent 2⁵³ 1 but this double format cannot.

CodePudding user response：

Int is used to store 32 bit two’s complement integer, and double usually used to store 64 bit double precision floating point value.

In int you can store 4 bytes, and in double 8 bytes. So yes, you can store any int number into double.