Home > database >  What is the algorithm that a compiler would use while casting signed variables to larger variable ty
What is the algorithm that a compiler would use while casting signed variables to larger variable ty

Time:02-03

The answer might be compiler dependent but;

What is the expected output of the lines below?

signed char a = -5;
printf("%x \n", (signed short) a); 
printf("%x \n", (unsigned short) a);

Would a compiler fill the Most Significant Bits with zeros (0) or ones (1) while casting signed char to a larger variable? How and when?


P.S. There are other issues too. I tried to run the code below on an online compiler for testing. The outputs were not as I expected. So I added the verbose castings, but it did not work. Why is the output of printf("%x \n", (signed char)b); 4 bytes long instead of 1?

int main()
{
    unsigned char a = (unsigned char)5;
    signed char b = (signed char)-5;
    
    unsigned short c;
    signed short d;
    
    c = (unsigned short)b;
    d = (signed short)b;
    
    printf("%x ||| %x ||| %x ||| %x\n", (unsigned char)a, (signed char)b, c, d);
    printf("%d ||| %d ||| %d ||| %d\n", a, b, c, d);
    printf("%d ||| %d ||| %d ||| %d\n", a, b, (signed char)c, (signed char)d);

    return 0;
}


Output:

5 ||| fffffffb ||| fffb ||| fffffffb
5 ||| -5 ||| 65531 ||| -5   
5 ||| -5 ||| -5 ||| -5

CodePudding user response:

This question is C-specific, really. In C, arguments to variadic functions (like printf) which are of lower rank than int are converted to int. (Not unsigned int unless the argument is unsigned and the same width as int).

Converting a signed short or signed char to signed int does not change the value. If you start with -5, you end up with -5.

But if you convert a negative signed value to an unsigned type, the conversion is done modulo one more than the maximum value of the unsigned type. For example, the maximum value of an unsigned short is 65535 (on many implementations), so converting -5 to unsigned short results in -5 modulo 65536, which is 65531. (C's % operator does not produce mathematical modular reduction.) When that value is then implicitly converted to an int, it is still 65531, so that's what's printed with %x (fffb).

Note that it is technically incorrect to apply %x to a signed int. %x requires that the corresponding argument be an unsigned int. Currently, C does not guarantee what the result of interpreting a signed value as unsigned will be, but that will soon change. (It's not a conversion. At runtime, types no longer exist, and values are just bit patterns.)

CodePudding user response:

The exact rules for converting between signed and unsigned types are listed in section 6.3.1.3 of the C11 standard:

1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

As for what the above means for this code:

signed char a = -5;
printf("%x \n", (signed short) a); 
printf("%x \n", (unsigned short) a);

There are a few things going on here.

For the first printf, you first have a conversion from signed char to signed short. By clause 1 above, since the value -5 can be stored in both, the value is unchanged by the cast. Then, because this value is passed to a variadic function, it is then promoted to type int, and again by clause 1 the value is unchanged.

Then the resulting int value is printed with the %x format specifier, which is expecting an unsigned int. This is technically undefined behavior for a mismatched format specifier, although most implementations will allow for implicit signed / unsigned reinterpretation. So assuming two's complement representation, the representation of the int value -5 will be printed, and assuming a 32 bit int this will be fffffffb.

For the second printf, the conversion from signed char to unsigned short will happen according to clause 2 above since the value -5 can't be stored in a unsigned short. Assuming a 16 bit short, this gives you the value 65536 - 5 = 65531. And assuming two complement representation, this is equivalent to sign-extending the representation from fb to fffb. This unsigned short value is then promoted to int when it is passed to printf, and by clause 1 the value is unchanged. Then the %x format specifier prints this as fffb.

CodePudding user response:

Conversions between integer types are value preserving when the value being converted is representable in the destination type. signed short can represent all values representable by signed char, so this ...

signed char a = -5;
printf("%hd\n", (signed short) a);

... would be expected to output a line containing "-5".

Your code, however, has undefined behavior. The conversion specifier %x requires the corresponding argument to have type unsigned int, whereas you are passing a signed short (converted to int according to the default argument promotions).

Provided that your implementation uses two's complement representation for signed integers (and I feel safe in asserting that it does), the representation will have sign-extended the original signed char to the width of a signed short, and then sign-extended that to the width of a (signed) int. Thus, one reasonably likely manifestation of the UB in your ...

printf("%x \n", (signed short) a); 

... would be to print

fffffffb

The other case is a bit different. Integer conversions where the target type is unsigned and cannot represent the source value are well defined. The source value is converted to the destination type by reducing it modulo the number of representable values in the target type. Thus, if your unsigned short has 16 value bits then the result of converting -5 to unsigned short is -5 modulo 65536, which is 65531.

Thus,

printf("%hu\n", (unsigned short) a);

would be expected to print a line containing "65531".

Again, the %x conversion specifier does not match the type of the corresponding argument ((unsigned short) a, converted to int via the default argument promotions), so your printf has undefined behavior. However, the conversion of a 16-bit unsigned short to a 32-bit int on a two's complement system will invole zero-extending the representation of the source, so one reasonably likely manifestation of the UB in your ...

printf("%x \n", (unsigned short) a);

... would be to print

fffb

.

  • Related