In Go (the language I'm most familiar with), the result of a mathematical operation is always the same data type as the operands, meaning if the operation overflows, the result will be incorrect. For example:
func main() {
var a byte = 100
var b byte = 9
var r byte = (a << b) >> b
fmt.Println(r)
}
This prints 0, as all the bits are shifted out of the bounds of a byte
during the initial << 9
operation, then zeroes are shifted back in during the >> 9
operation.
However, this isn't the case in C:
int main() {
unsigned char a = 100;
unsigned char b = 9;
unsigned char r = (a << b) >> b;
printf("%d\n", r);
return 0;
}
This code prints 100. Although this yields the "correct" result, this is unexpected to me, as I'd only expect promotion if one of the operands were larger than a byte, but in this case all operands are bytes. It's as though the temporary variable holding the result of the << 9
operation is larger than the resulting variable, and is only downcast back to a byte after the full RHS is evaluated, and thus after the >> 9
operation restores the bits.
Obviously, if explicitly storing the result of the >> 9
into a byte before continuing, you get the same result as in Go:
int main() {
unsigned char a = 100;
unsigned char b = 9;
unsigned char c = a << b;
unsigned char r = c >> b;
printf("%d\n", r);
return 0;
}
This isn't merely the case with bitwise operators. I've tested with multiplication/division too, and it demonstrates the same behaviour.
My question is: is this behaviour of C defined? If so, where? Does it actually use a specific data type for the interim values of a complex expression? Or is this actually undefined behaviour, like an incidental result of the operations being performed in a 32/64 bit CPU register before being saved back to memory?
CodePudding user response:
Welcome to integer promotions! One behavior of the C language (an often criticized one, I'd add) is that types like char
and short
are promoted to int
before doing any arithmetic operation with them, and the result is also int
. What does this mean?
unsigned char foo(unsigned char x) {
return (x << 4) >> 4;
}
int main(void) {
if (foo(0xFF) == 0x0F) {
printf("Yay!\n");
}
else {
printf("... hey, wait a minute!\n");
}
return 0;
}
Needless to say, the above code prints ... hey, wait a minute!
. Let's discover why:
// this line of code:
return (x << 4) >> 4;
// is converted to this (because of integer promotion):
return ((int) x << 4) >> 4;
Therefore, this is what happens:
x
isunsigned char
(8-bit) and its value is0xFF
,x << 4
needs to be executed, but firstx
is converted toint
(32-bit),x << 4
becomes0x000000FF << 4
, and the result0x00000FF0
is alsoint
,0x00000FF0 >> 4
is executed, yielding0x000000FF
,- finally,
0x000000FF
is converted tounsigned char
(because that's the return value offoo()
), so it becomes0xFF
, - and that's why
foo(0xFF)
yields0xFF
instead of0x0F
.
How to prevent this? Simple: convert the result of x << 4
to unsigned char
. In the previous example, 0x00000FF0
would have become 0xF0
.
unsigned char foo(unsigned char x) {
return ((unsigned char) (x << 4)) >> 4;
}
foo(0xFF) == 0x0F
NOTE: in the previous examples, it is assumed that unsigned char
is 8 bits and int
is 32 bits, but the examples work for basically any situation in which CHAR_BIT == 8
(because C17 requires that sizeof(int) * CHAR_BIT >= 16
).
P.S.: this answer is not as exhaustive as the C official standard document, of course. But you can find all the (valid and defined) behavior of C described in the latest draft of the ISO/IEC 9899:2018 standard (a.k.a. C17/C18).
CodePudding user response:
C 2018 6.5.7 discusses the shift operators. Paragraph 3 says:
The integer promotions are performed on each of the operands…
6.3.1.1 2 specifies the integer promotions:
… If an
int
can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to anint
; otherwise, it is converted to anunsigned int
. These are called the integer promotions. All other types are unchanged by the integer promotions.
Thus in a << b
where a
and b
are unsigned char
, a
is promoted to int
, which is at least 16 bits. (A C implementation may define unsigned char
to be more than eight bits. It could be the same width as int
. In this case, the integer promotions would not convert a
or b
.)
Note that if the integer promotions were not applied, the behavior of evaluating a << b
with b
equal to 9 would not be defined by the C standard, as the behavior of the shift operators is not defined for shift amounts greater than or equal to the width of the left operator.
6.5.5 specifies the multiplicative operators. Paragraph 3 says:
The usual arithmetic conversions are performed on the operands.
6.3.1.8 specifies the usual arithmetic conversions:
… First, if the corresponding real type of either operand is
long double
, the other operand is converted, without change of type domain [complex or real], to a type whose corresponding real type islong double
.Otherwise, if the corresponding real type of either operand is
double
, the other operand is converted, without change of type domain, to a type whose corresponding real type isdouble
.Otherwise, if the corresponding real type of either operand is
float
, the other operand is converted, without change of type domain, to a type whose corresponding real type isfloat.
Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
Rank has a technical definition that largely corresponds to width (number of bits in an integer type).
Thus, in a * b
where a
and b
are unsigned char
, they are both promoted to int
(with the caveat above about wide unsigned char
) and no further conversions are necessary. If one operand were wider than int
, say long long int
, while the other is unsigned char
then both operands would be converted to that wider type.