I bumped into this while writing a program trying to print the constituent byte values of UTF-8 characters.
This is the program that I wrote to test the various ~0
operations:
#include <stdio.h>
int main()
{
printf("%x\n", (char)~0); // ffffffff
printf("%x\n", (unsigned char)~0); // ff
printf("%d\n", sizeof(char) == sizeof(unsigned char)); // 1
printf("%d\n", sizeof(char) == sizeof(unsigned int)); // 0
printf("%d\n", (char)~0 == (unsigned int)~0); // 1
}
I'm struggling to understand why char
would produce an int
-sized value, when unsigned char
produces a char
-sized value.
CodePudding user response:
When passing a type smaller than int
to a variadic function like printf
, it get promoted to type int
.
In the first case, you're passing char
with value -1 whose representation (assuming 2's complement) is 0xff. This is promoted to an int
with value -1 and representation 0xffffffff, so this is what is printed.
In the second case, you're passing an unsigned char
with value 255 whose representation is 0xff. This is promoted to an int
with value 255 and representation 0x000000ff, so this is what is printed (without the leading zeros).
CodePudding user response:
In these both calls
printf("%x\n", (char)~0); // ffffffff
printf("%x\n", (unsigned char)~0); // ff
the expressions (char)~0)
and (unsigned char)~0)
are converted to the type int
due to the integer promotions.
In the used system the type char
behaves as the type signed char
. So the sign bit in this expression is propagated when the expression is promoted to the type int
.
On the other hand, before the integer promotions this expression (unsigned char)~0
has the type unsigned char
due to the casting to the unsigned type. So neither sign bit is propagated when the expression is promoted to the type int
.
Pay attention to that the conversion specifier x
is applied to objects of the type unsigned int. So the first call of printf should be written like
printf("%x\n", ( unsigned int )(char)~0);
CodePudding user response:
They do not produce values of different widths. They produce values with different numbers of set bits in them.
In your C implementation, it appears int
is 32 bits and char
is signed. I will use these in this answer, but readers should note the C standard allows other choices.
I will use hexadecimal to denote the bits that represent values.
In (char)~0
, 0
is an int
. ~0
then has bits FFFFFFFF. In a 32-bit two’s complement int
, this represents −1. (char)
converts this to a char
.
At this point, we have a char
with value −1, represented with bits FF. When that is passed as an argument to printf
, it is automatically converted to an int
. Since its value is −1, it is converted to an int
with value −1. The bits representing that int
are FFFFFFFF. You ask printf
to format this with %x
. Technically, that is a mistake; %x
is for unsigned int
, but your printf
implementation formats the bits FFFFFFFF as if they were an unsigned int
, producing output of “ffffffff”.
In (unsigned char)~0)
, ~0
again has value −1 represented with bits FFFFFFFF, but now the cast is to unsigned char
. Conversion to an unsigned integer type wraps modulo M, where M is one more than the maximum value of the type, so 256 for an eight-bit unsigned char
. Mathematically, the conversion is −1 1•256 = 255, which is the starting value plus the multiple of 256 needed to bring the value into the range of unsigned char
. The result is 255. Practically, it is implemented by taking the low eight bits, so FFFFFFFF becomes FF. However, in unsigned char
, the bits FF represent 255 instead of −1.
Now we have an unsigned char
with value 255, represented with bits FF. Passing that to printf
results in automatic conversion to an int
. Since its unsigned char
value is 255, the result of conversion to int
is 255. When you ask printf
to format this with %x
(which is a mistake as above), printf
formats it as if the bits were an unsigned int
, producing output of “ff”.