Do I have correct understanding of Undefined Behavior for shift operators in C?-CodePudding

I want to make sure I understand exactly under what circumstances the << and >> operators in C produce Undefined Behavior. This is my current understanding:

Let...:

x_t be the type of x after integer promotion
N be the bitwidth of x after integer promotion
M be the number of 0s to the left of the most-significant 1 bit in the representation of x after integer promotion

x << y is UB if any of the following:

x < 0 (even if y == 0)
y < 0
y >= N
x_t is a signed type and y >= M

x >> y is UB if any of the following:

y < 0
y >= N

...and is implementation defined if:

x < 0

If I have this understanding correct, it would imply the following:

    unsigned short x = 1;
    x << 31;

This would be undefined behavior in the case where int is 32 bits and short is 16 (because x would be promoted to int, and the left shift by 31 would put the 1 bit into position 31), but it would be defined behavior in the case where int and short are both 32 bits (because x would be promoted to an unsigned int and 31 < 32).

CodePudding user response：

Yes.

I find your definition of M a little weak. Specifically, it wasn't clear to me if you were including the sign bit.

But yes, the interpretation is correct.

The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

y < 0 ⇒ UB
y >= N ⇒ UB

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

This paragraph is poorly worded.

There's no doubt that the behaviour of E1 << E2 isn't defined when E1 × 2^E2 isn't representable.

For x << y, x_t is a signed type and y >= M ⇒ UB^[1]

But what about when "E1 has a signed type and nonnegative value" is false? 3 << 2 is clearly not undefined behaviour, so that means neither the "then" not the "otherwise" clauses apply when this is false, so that means the spec is silent on the behaviour of -3 << 2. It's literally behaviour that's not defined by the spec. So,

For x << y, x < 0 ⇒ UB

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1/2^E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.

For x >> y, x < 0 ⇒ Implementation defined

We need to consider not just two's-complement, but ones' complement and sign-magnitude in assessing the validity of this interpretation.

CodePudding user response：

Do I have correct understanding of Undefined Behavior for shift operators in C?

Yes, whereas the M part is a little vague.

While the any of the following: lists some examples, it does not exhaust the whole possible space of behaviors. INT_MAX << 1 is also UB. The rule is x * 2**y <= X_T_MAX, where X_T_MAX is the maximum representable value in type x_t. That is a 2D plane of allowed numbers.

CodePudding user response：

What you have written here is far more cumbersome to comprehend than the actual standard... the TL;DR of C17 6.5.7 can be summarized as:

UB: Don't left shift variables containing negative values.
UB: Don't left shift data into the sign bit of a signed operand (the type obtained after promotion).
UB: Don't shift by a negative or too large shift count.
Right-shifting variables containing negative values gives implementation-defined behavior in the form of either logical or arithmetic shift. Non-portable.

Done, that's it. No need to make things more complicated.

The golden rule is: never perform bitwise arithmetic on signed types ever. Abide it and you'll avoid a number of well-known bugs.