I'm trying to understand what happens when I store a pointer of different data type into char pointer. I understand everything except why these two lines:
…
char *cc;
long l;
…
cc = &l;
printf("\nc: %ld, cc: %u", *cc, cc);
are printing this:
c: 4294967211, cc: 591272048
instead of this->
c: 171 (0xAB), cc: 591272048?
Code:
#include <stdio.h>
int main()
{
char c = 65, *cc;
int i = 0x12345678;
long l = 0x12345AB;
float f = 3.14;
cc = &c;
printf("c: %c, cc: %u", *cc, cc);
cc = &i;
printf("\nc: %d, cc: %u", *cc, cc);
cc = &l;
printf("\nc: %ld, cc: %u", *cc, cc);
cc = &f;
printf("\nc: %f, cc: %u", *cc, cc);
}
Prints:
c: A, cc: 591272063
c: 120, cc: 591272056
c: 4294967211, cc: 591272048
c: 0.000000, cc: 4294967235
CodePudding user response:
You are invoking Undefined Behaviour, and get... undefined results! This can happen if on your target architecture:
- the
char
type is signed - the
int
type is 32 bits long - the
long
type is 64 bits long - endianness is little endian
- negative numbers use 2's complement representation
Then when executing
cc = &l;
printf("\nc: %ld, cc: %u", *cc, cc);
*cc
is the last byte of l
and is 0xAB
(or -85) it is promoted to the signed integer 0xFFFFFFAB. You already have a lot of unspecified behaviour until that point...
But when you use %ld
in printf, the program tries to read a little endian long when you only passed a char value which was promoted to int. The 32 highest order bits happen to be 0 here but that part is UB and anything could happen. But as a long, 0xFFFFFFAB
is indeed 4 294 967 211
But as you invoked UB, you cannot rely on that...
TL/DR: your code invokes Undefined Behaviour and result could be anything.
CodePudding user response:
There are many things wrong with this program; it contains multiple forms of undefined behavior. I recommend to start compiling with these settings: What compiler options are recommended for beginners learning C?
First of all, most of these pointer assignments are invalid C - you cannot assign a pointer to one type to a pointer of a different, non-compatible type in C (see C17 6.5.16.1 and 6.5.4). You must use an explicit cast:
cc = (char*)&i;
.
Second, it is undefined behavior to use one particular conversion specifier in printf
but pass a variable by a different type. Similarly, you must print pointers using %p
and you should cast the pointer to void*
before you print it.
You should always place \n
at the end of each printf
call since on line buffered systems, \n
is often used to flush stdout and make things appear on the screen.
After correcting most of these cases of invalid C/undefined behavior/bugs except for still using wrong conversion specifiers, the code looks like this:
#include <stdio.h>
int main (void)
{
char c = 65, *cc;
int i = 0x12345678;
long l = 0x12345AB;
float f = 3.14;
cc = &c;
printf("c: %c, cc: %p\n", *cc, (void*)cc);
cc = (char*)&i;
printf("c: %d, cc: %p\n", *cc, (void*)cc);
cc = (char*)&l;
printf("c: %ld, cc: %p\n", *cc, (void*)cc);
cc = (char*)&f;
printf("c: %f, cc: %p\n", *cc, (void*)cc);
}
This code still invokes undefined behavior, so you are not guaranteed to get any particular result. When I run it on gcc x86_64 Linux I get the same output as you do however.
If we try to reason what the compiler does here, then in all cases the character you pass to printf
gets implicitly promoted to int
. This is because printf
is a "variadic function" and these functions come with a special rule of implicit argument promotion called "the default argument promotions". Meaning you might as well type (int)*cc
and you would get the same results.
On a little endian computer as is the case here, the character pointer points at the least significant byte of the larger types. That is 0x78, 0xAB and so on.
Now as it happens, the char
type is problematic for doing hardware-related programming like this, because it has implementation-defined signedness. Meaning that a compiler can let it either have the range -128 to 127 (2's complement) or 0 to 255. In your case it picked the former. A constant like 0xAB when read through the char*
will get treated as a negative value -85.
Then upon promotion to int
when passed to printf, this negative value gets "sign extended" - the compiler tries to preserve the decimal value -85. But since it is now stored in a 32 bit int
, the binary representation of that value is 0xFFFFFFAB
. If we would attempt to print that with %d
we'd get -85. If converting to unsigned int we would get 4294967211
.
But in case of %ld
and 8 byte data, the value 4294967211
fits inside one. The implicit promotion always goes to type int
. So a positive value is printed. Had you done an explicit conversion passed (long)*cc
to printf
instead, then it would have been sign extended as was done previously and it would print -85
.
As you can tell it's a bad idea to use char
or any signed types whenever dealing with raw data. Preferably use the unsigned integer types from stdint.h
instead. That is, uint8_t
to uint64_t
.
CodePudding user response:
cc = &i;
sets cc
to point to the first byte of i
. This assignment violates a constraint of the C standard, so the compiler is required to issue a diagnostic message, but it may accept the program.
In printf("\nc: %d, cc: %u", *cc, cc);
, *cc
is the value stored in the first byte of i
. Your C implementation uses eight-bit bytes and stores the bytes of an int
in little-endian order, mean the low-value byte is at the lowest memory address. The lowest byte of 0x12345678
is 7816, which is 12010, so printf
prints “120” for this. Then %u
instruction printf
to print an unsigned int
, but you pass it cc
, which is a pointer. The behavior of this is not defined by the C standard. printf
apparently does what it can in this situation. You should instead print the pointer by using %p
for the conversion specification and (void *) cc
for the argument.
cc = &l;
sets cc
to point to the first byte of l
, as with i
above.
In printf("\nc: %ld, cc: %u", *cc, cc);
, *cc
is the first byte of l
, which is AB16. In your C implementation, char
is signed and two’s complement, so the bits of AB16 are interpreted as −85. This is automatically provided to an int
with value −85, which is represented with the bits of FFFFFFAB16. However, %ld
instructs printf
to expect a long int
, where you have only passed an int
. What may have happened here is that some 32 neighboring zero bits (possibly in the same register used to pass the bits FFFFFFAB16) may have been used with the 32 bits FFFFFFAB16 to form the bits 00000000FFFFFFAB16, and printf
used that as a long int
. As a long int
, those bits represent 4,294,967,211, so printf
printed “4294967211”.
cc = &f;
sets cc
to point to the first byte of f
, as above.
In printf("\nc: %f, cc: %u", *cc, cc);
, *cc
is the first byte of f
. It is some char
value, which is again promoted to int
, but %f
instructs printf
to expect a double
. In your C implementation, double
arguments and int
arguments may be passed in different processor registers. So, when printf
fetches bits from where it expects a double
, it does not get any bits at all of the int
you are passing. The bits it did get may have been all zeros, which represent zero in common floating-point encoding schemes, so printf
printed “0.000000”.
Also in the output from that printf
, we see “cc: 4294967235”. This value, 2,294,967,235, is different from the other values we saw printed for addresses, like 591,272,048. The reason for this is that, after printf
tried to get a double
value for %f
, it then tried to get an unsigned
argument for %u
. Instead of getting it from the place where a second integer argument is passed, as in the previous printf
calls, it got it from the place where a first integer argument is passed. Thus, it got the int
value passed for *cc
instead of the address passed for cc
.