Why do `(char)~0` and `(unsigned char)~0` return values of different widths?-CodePudding

I bumped into this while writing a program trying to print the constituent byte values of UTF-8 characters.

This is the program that I wrote to test the various ~0 operations:

#include <stdio.h>

int main()
{
    printf("%x\n", (char)~0); // ffffffff
    printf("%x\n", (unsigned char)~0); // ff
    printf("%d\n", sizeof(char) == sizeof(unsigned char)); // 1
    printf("%d\n", sizeof(char) == sizeof(unsigned int)); // 0
    printf("%d\n", (char)~0 == (unsigned int)~0); // 1
}

I'm struggling to understand why char would produce an int-sized value, when unsigned char produces a char-sized value.

CodePudding user response：

When passing a type smaller than int to a variadic function like printf, it get promoted to type int.

In the first case, you're passing char with value -1 whose representation (assuming 2's complement) is 0xff. This is promoted to an int with value -1 and representation 0xffffffff, so this is what is printed.

In the second case, you're passing an unsigned char with value 255 whose representation is 0xff. This is promoted to an int with value 255 and representation 0x000000ff, so this is what is printed (without the leading zeros).

CodePudding user response：

In these both calls

printf("%x\n", (char)~0); // ffffffff
printf("%x\n", (unsigned char)~0); // ff

the expressions (char)~0) and (unsigned char)~0) are converted to the type int due to the integer promotions.

In the used system the type char behaves as the type signed char. So the sign bit in this expression is propagated when the expression is promoted to the type int.

On the other hand, before the integer promotions this expression (unsigned char)~0 has the type unsigned char due to the casting to the unsigned type. So neither sign bit is propagated when the expression is promoted to the type int.

Pay attention to that the conversion specifier x is applied to objects of the type unsigned int. So the first call of printf should be written like

printf("%x\n", ( unsigned int )(char)~0);

CodePudding user response：

They do not produce values of different widths. They produce values with different numbers of set bits in them.

In your C implementation, it appears int is 32 bits and char is signed. I will use these in this answer, but readers should note the C standard allows other choices.

I will use hexadecimal to denote the bits that represent values.

In (char)~0, 0 is an int. ~0 then has bits FFFFFFFF. In a 32-bit two’s complement int, this represents −1. (char) converts this to a char.

At this point, we have a char with value −1, represented with bits FF. When that is passed as an argument to printf, it is automatically converted to an int. Since its value is −1, it is converted to an int with value −1. The bits representing that int are FFFFFFFF. You ask printf to format this with %x. Technically, that is a mistake; %x is for unsigned int, but your printf implementation formats the bits FFFFFFFF as if they were an unsigned int, producing output of “ffffffff”.

In (unsigned char)~0), ~0 again has value −1 represented with bits FFFFFFFF, but now the cast is to unsigned char. Conversion to an unsigned integer type wraps modulo M, where M is one more than the maximum value of the type, so 256 for an eight-bit unsigned char. Mathematically, the conversion is −1 1•256 = 255, which is the starting value plus the multiple of 256 needed to bring the value into the range of unsigned char. The result is 255. Practically, it is implemented by taking the low eight bits, so FFFFFFFF becomes FF. However, in unsigned char, the bits FF represent 255 instead of −1.

Now we have an unsigned char with value 255, represented with bits FF. Passing that to printf results in automatic conversion to an int. Since its unsigned char value is 255, the result of conversion to int is 255. When you ask printf to format this with %x (which is a mistake as above), printf formats it as if the bits were an unsigned int, producing output of “ff”.