I want to make sure I understand exactly under what circumstances the <<
and >>
operators in C produce Undefined Behavior. This is my current understanding:
Let...:
x_t
be the type ofx
after integer promotionN
be the bitwidth ofx
after integer promotionM
be the number of 0s to the left of the most-significant 1 bit in the representation ofx
after integer promotion
x << y
is UB if any of the following:
x < 0
(even ify == 0
)y < 0
y >= N
x_t
is a signed type andy >= M
x >> y
is UB if any of the following:
y < 0
y >= N
...and is implementation defined if:
x < 0
If I have this understanding correct, it would imply the following:
unsigned short x = 1;
x << 31;
This would be undefined behavior in the case where int
is 32 bits and short
is 16 (because x
would be promoted to int
, and the left shift by 31 would put the 1 bit into position 31), but it would be defined behavior in the case where int
and short
are both 32 bits (because x
would be promoted to an unsigned int
and 31 < 32).
CodePudding user response:
Yes.
I find your definition of M
a little weak. Specifically, it wasn't clear to me if you were including the sign bit.
But yes, the interpretation is correct.
- The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
y < 0
⇒ UBy >= N
⇒ UB
The result of
E1 << E2
isE1
left-shiftedE2
bit positions; vacated bits are filled with zeros. IfE1
has an unsigned type, the value of the result isE1
× 2E2
, reduced modulo one more than the maximum value representable in the result type. IfE1
has a signed type and nonnegative value, andE1
× 2E2
is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
This paragraph is poorly worded.
There's no doubt that the behaviour of E1 << E2
isn't defined when E1
× 2E2
isn't representable.
- For
x << y
,x_t
is a signed type andy >= M
⇒ UB[1]
But what about when "E1
has a signed type and nonnegative value" is false? 3 << 2
is clearly not undefined behaviour, so that means neither the "then" not the "otherwise" clauses apply when this is false, so that means the spec is silent on the behaviour of -3 << 2
. It's literally behaviour that's not defined by the spec. So,
- For
x << y
,x < 0
⇒ UB
The result of
E1 >> E2
isE1
right-shiftedE2
bit positions. IfE1
has an unsigned type or ifE1
has a signed type and a nonnegative value, the value of the result is the integral part of the quotient ofE1
/2E2
. IfE1
has a signed type and a negative value, the resulting value is implementation-defined.
- For
x >> y
,x < 0
⇒ Implementation defined
- We need to consider not just two's-complement, but ones' complement and sign-magnitude in assessing the validity of this interpretation.
CodePudding user response:
Do I have correct understanding of Undefined Behavior for shift operators in C?
Yes, whereas the M
part is a little vague.
While the any of the following:
lists some examples, it does not exhaust the whole possible space of behaviors. INT_MAX << 1
is also UB. The rule is x * 2**y <= X_T_MAX
, where X_T_MAX
is the maximum representable value in type x_t
. That is a 2D plane of allowed numbers.
CodePudding user response:
What you have written here is far more cumbersome to comprehend than the actual standard... the TL;DR of C17 6.5.7 can be summarized as:
- UB: Don't left shift variables containing negative values.
- UB: Don't left shift data into the sign bit of a signed operand (the type obtained after promotion).
- UB: Don't shift by a negative or too large shift count.
- Right-shifting variables containing negative values gives implementation-defined behavior in the form of either logical or arithmetic shift. Non-portable.
Done, that's it. No need to make things more complicated.
The golden rule is: never perform bitwise arithmetic on signed types ever. Abide it and you'll avoid a number of well-known bugs.