Home > Back-end >  C uses different data type for arithmetic in the middle of an expression?
C uses different data type for arithmetic in the middle of an expression?

Time:09-18

In Go (the language I'm most familiar with), the result of a mathematical operation is always the same data type as the operands, meaning if the operation overflows, the result will be incorrect. For example:

func main() {
    var a byte = 100
    var b byte = 9
    var r byte = (a << b) >> b
    fmt.Println(r)
}

This prints 0, as all the bits are shifted out of the bounds of a byte during the initial << 9 operation, then zeroes are shifted back in during the >> 9 operation.

However, this isn't the case in C:

int main() {
    unsigned char a = 100;
    unsigned char b = 9;
    unsigned char r = (a << b) >> b;
    printf("%d\n", r);
    return 0;
}

This code prints 100. Although this yields the "correct" result, this is unexpected to me, as I'd only expect promotion if one of the operands were larger than a byte, but in this case all operands are bytes. It's as though the temporary variable holding the result of the << 9 operation is larger than the resulting variable, and is only downcast back to a byte after the full RHS is evaluated, and thus after the >> 9 operation restores the bits.

Obviously, if explicitly storing the result of the >> 9 into a byte before continuing, you get the same result as in Go:

int main() {
    unsigned char a = 100;
    unsigned char b = 9;
    unsigned char c = a << b;
    unsigned char r = c >> b;
    printf("%d\n", r);
    return 0;
}

This isn't merely the case with bitwise operators. I've tested with multiplication/division too, and it demonstrates the same behaviour.

My question is: is this behaviour of C defined? If so, where? Does it actually use a specific data type for the interim values of a complex expression? Or is this actually undefined behaviour, like an incidental result of the operations being performed in a 32/64 bit CPU register before being saved back to memory?

CodePudding user response:

Welcome to integer promotions! One behavior of the C language (an often criticized one, I'd add) is that types like char and short are promoted to int before doing any arithmetic operation with them, and the result is also int. What does this mean?

unsigned char foo(unsigned char x) {
  return (x << 4) >> 4;
}

int main(void) {
  if (foo(0xFF) == 0x0F) {
    printf("Yay!\n");
  }
  else {
    printf("... hey, wait a minute!\n");
  }

  return 0;
}

Needless to say, the above code prints ... hey, wait a minute!. Let's discover why:

// this line of code:
return (x << 4) >> 4;

// is converted to this (because of integer promotion):
return ((int) x << 4) >> 4;

Therefore, this is what happens:

  • x is unsigned char (8-bit) and its value is 0xFF,
  • x << 4 needs to be executed, but first x is converted to int (32-bit),
  • x << 4 becomes 0x000000FF << 4, and the result 0x00000FF0 is also int,
  • 0x00000FF0 >> 4 is executed, yielding 0x000000FF,
  • finally, 0x000000FF is converted to unsigned char (because that's the return value of foo()), so it becomes 0xFF,
  • and that's why foo(0xFF) yields 0xFF instead of 0x0F.

How to prevent this? Simple: convert the result of x << 4 to unsigned char. In the previous example, 0x00000FF0 would have become 0xF0.

unsigned char foo(unsigned char x) {
  return ((unsigned char) (x << 4)) >> 4;
}

foo(0xFF) == 0x0F

NOTE: in the previous examples, it is assumed that unsigned char is 8 bits and int is 32 bits, but the examples work for basically any situation in which CHAR_BIT == 8 (because C17 requires that sizeof(int) * CHAR_BIT >= 16).

P.S.: this answer is not as exhaustive as the C official standard document, of course. But you can find all the (valid and defined) behavior of C described in the latest draft of the ISO/IEC 9899:2018 standard (a.k.a. C17/C18).

CodePudding user response:

C 2018 6.5.7 discusses the shift operators. Paragraph 3 says:

The integer promotions are performed on each of the operands…

6.3.1.1 2 specifies the integer promotions:

… If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

Thus in a << b where a and b are unsigned char, a is promoted to int, which is at least 16 bits. (A C implementation may define unsigned char to be more than eight bits. It could be the same width as int. In this case, the integer promotions would not convert a or b.)

Note that if the integer promotions were not applied, the behavior of evaluating a << b with b equal to 9 would not be defined by the C standard, as the behavior of the shift operators is not defined for shift amounts greater than or equal to the width of the left operator.

6.5.5 specifies the multiplicative operators. Paragraph 3 says:

The usual arithmetic conversions are performed on the operands.

6.3.1.8 specifies the usual arithmetic conversions:

… First, if the corresponding real type of either operand is long double, the other operand is converted, without change of type domain [complex or real], to a type whose corresponding real type is long double.

Otherwise, if the corresponding real type of either operand is double, the other operand is converted, without change of type domain, to a type whose corresponding real type is double.

Otherwise, if the corresponding real type of either operand is float, the other operand is converted, without change of type domain, to a type whose corresponding real type is float.

Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:

  • If both operands have the same type, then no further conversion is needed.

  • Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.

  • Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.

  • Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

  • Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

Rank has a technical definition that largely corresponds to width (number of bits in an integer type).

Thus, in a * b where a and b are unsigned char, they are both promoted to int (with the caveat above about wide unsigned char) and no further conversions are necessary. If one operand were wider than int, say long long int, while the other is unsigned char then both operands would be converted to that wider type.

  • Related