Is there a standard binary representation of integer data types in c 20?-CodePudding

I understand that with c 20 sign magnitude and one's comp are finally being phased out in favor of standardizing two's comp. (see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r3.html, and http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1236r1.html) I was wondering what this meant for the implications of how much we can make assumptions about the binary representation of integers now in c 20? As I'm reading it, it seems like a lot of thought has been put into the allowed ranges, but I don't see anything that would really indicate requirements on the bit layout, nor endianness. I would thus assume that endianness is still an issue, but what about bit layout?

according to the standard, is 0b00000001 == 1 always true for an int8_t? What about 0b11111111 == -1

I understand that on nearly all practical systems, the leftmost bit will be the most significant, decreasing incrementally until the rightmost and least significant byte is reached, and all systems I've tested this on seem to use this representation, but does the standard say anything about this and any guarantees we get? Or would it be safer to use a 256 element lookup table to map each value a byte can represent to a specific bit representation explicitly if we need to know the underlying representation rather than relying on this? I'd rather not take the performance hit of a lookup if I can use the bytes directly as is, but I'd also like to make sure that my code isn't making too many assumptions as portability is important.

CodePudding user response：

The sign bit is required to be the most significant bit (§[basic.fundamental]/3):

For each value x of a signed integer type, the value of the corresponding unsigned integer type congruent to x modulo 2^N has the same value of corresponding bits in its value representation.

Things only work this way if the sign bit is what would be the MSB in an unsigned.

This also requires that (for example) uint8_t x = -1; will set x to 0b11111111 (since -1 reduced modulo 2⁸ is 255). In fact, that's used as an example in the standard:

[Example: The value −1 of a signed integer type has the same representation as the largest value of the corresponding unsigned type. —end example]

As far as an offset representation goes, I believe it's considered impossible. The C standard refers to the C standard which requires (§6.2.6.2/1):

If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^N-1, so that objects of that type shall be capable of representing values from 0 to 2^{N - 1} using a pure binary representation;

"using a pure binary representation" is at least normally interpreted as meaning a representation like:

b_Nb_N-1b_N-2...b₂b₁b₀.

I.e., where, if you count bits from 0 through N-1, each bit represents the corresponding power of 2.

CodePudding user response：

The C 20 standard requires that signed integers work as follows:

For each value x of a signed integer type, the value of the corresponding unsigned integer type congruent to x modulo 2^N has the same value of corresponding bits in its value representation.

This is how two's complement is defined (there's even a footnote telling you that's what this means). This does not allow for the sign bit to appear anywhere except the highest bit in the value representation of the signed integer. And this does not allow for conversions to the unsigned equivalent to move that bit anywhere other than the highest bit in the value representation of the unsigned equivalent.

Two's complement means two's complement.

according to the standard, is 0b00000001 == 1 always true for an int8_t? What about 0b11111111 == -1

In terms of representation, this has been true since C 11. This is because the specific sized signed integer types were always required to be two's complement (even if signed char wasn't). Of course, these types are only optionally supported, so if you wanted maximum portability, you couldn't rely on them.