Home > Back-end >  What happens when address of a long variable is store in char pointer?
What happens when address of a long variable is store in char pointer?

Time:03-04

I'm trying to understand what happens when I store a pointer of different data type into char pointer. I understand everything except why these two lines:

…
char *cc;
long l;
…
cc = &l;
printf("\nc: %ld, cc: %u", *cc, cc);

are printing this:

c: 4294967211, cc: 591272048

instead of this->

c: 171 (0xAB), cc: 591272048?

Code:

#include <stdio.h>    
int main()
{
  char c = 65, *cc;
  int i = 0x12345678;
  long l = 0x12345AB;
  float f = 3.14;

  cc = &c;
  printf("c: %c, cc: %u", *cc, cc);
  cc = &i;
  printf("\nc: %d, cc: %u", *cc, cc);
  cc = &l;
  printf("\nc: %ld, cc: %u", *cc, cc);
  cc = &f;
  printf("\nc: %f, cc: %u", *cc, cc);
}

Prints:

c: A, cc: 591272063
c: 120, cc: 591272056
c: 4294967211, cc: 591272048
c: 0.000000, cc: 4294967235

CodePudding user response:

You are invoking Undefined Behaviour, and get... undefined results! This can happen if on your target architecture:

  • the char type is signed
  • the int type is 32 bits long
  • the long type is 64 bits long
  • endianness is little endian
  • negative numbers use 2's complement representation

Then when executing

cc = &l;
printf("\nc: %ld, cc: %u", *cc, cc);

*cc is the last byte of l and is 0xAB (or -85) it is promoted to the signed integer 0xFFFFFFAB. You already have a lot of unspecified behaviour until that point...

But when you use %ld in printf, the program tries to read a little endian long when you only passed a char value which was promoted to int. The 32 highest order bits happen to be 0 here but that part is UB and anything could happen. But as a long, 0xFFFFFFAB is indeed 4 294 967 211

But as you invoked UB, you cannot rely on that...

TL/DR: your code invokes Undefined Behaviour and result could be anything.

CodePudding user response:

There are many things wrong with this program; it contains multiple forms of undefined behavior. I recommend to start compiling with these settings: What compiler options are recommended for beginners learning C?

First of all, most of these pointer assignments are invalid C - you cannot assign a pointer to one type to a pointer of a different, non-compatible type in C (see C17 6.5.16.1 and 6.5.4). You must use an explicit cast:
cc = (char*)&i;.

Second, it is undefined behavior to use one particular conversion specifier in printf but pass a variable by a different type. Similarly, you must print pointers using %p and you should cast the pointer to void* before you print it.

You should always place \n at the end of each printf call since on line buffered systems, \n is often used to flush stdout and make things appear on the screen.

After correcting most of these cases of invalid C/undefined behavior/bugs except for still using wrong conversion specifiers, the code looks like this:

#include <stdio.h>

int main (void)
{
  char c = 65, *cc;
  int i = 0x12345678;
  long l = 0x12345AB;
  float f = 3.14;

  cc = &c;
  printf("c: %c, cc: %p\n",  *cc, (void*)cc);
  cc = (char*)&i;
  printf("c: %d, cc: %p\n",  *cc, (void*)cc);
  cc = (char*)&l;
  printf("c: %ld, cc: %p\n", *cc, (void*)cc);
  cc = (char*)&f;
  printf("c: %f, cc: %p\n",  *cc, (void*)cc);
}

This code still invokes undefined behavior, so you are not guaranteed to get any particular result. When I run it on gcc x86_64 Linux I get the same output as you do however.

If we try to reason what the compiler does here, then in all cases the character you pass to printf gets implicitly promoted to int. This is because printf is a "variadic function" and these functions come with a special rule of implicit argument promotion called "the default argument promotions". Meaning you might as well type (int)*cc and you would get the same results.

On a little endian computer as is the case here, the character pointer points at the least significant byte of the larger types. That is 0x78, 0xAB and so on.

Now as it happens, the char type is problematic for doing hardware-related programming like this, because it has implementation-defined signedness. Meaning that a compiler can let it either have the range -128 to 127 (2's complement) or 0 to 255. In your case it picked the former. A constant like 0xAB when read through the char* will get treated as a negative value -85.

Then upon promotion to int when passed to printf, this negative value gets "sign extended" - the compiler tries to preserve the decimal value -85. But since it is now stored in a 32 bit int, the binary representation of that value is 0xFFFFFFAB. If we would attempt to print that with %d we'd get -85. If converting to unsigned int we would get 4294967211.

But in case of %ld and 8 byte data, the value 4294967211 fits inside one. The implicit promotion always goes to type int. So a positive value is printed. Had you done an explicit conversion passed (long)*cc to printf instead, then it would have been sign extended as was done previously and it would print -85.

As you can tell it's a bad idea to use char or any signed types whenever dealing with raw data. Preferably use the unsigned integer types from stdint.h instead. That is, uint8_t to uint64_t.

CodePudding user response:

cc = &i; sets cc to point to the first byte of i. This assignment violates a constraint of the C standard, so the compiler is required to issue a diagnostic message, but it may accept the program.

In printf("\nc: %d, cc: %u", *cc, cc);, *cc is the value stored in the first byte of i. Your C implementation uses eight-bit bytes and stores the bytes of an int in little-endian order, mean the low-value byte is at the lowest memory address. The lowest byte of 0x12345678 is 7816, which is 12010, so printf prints “120” for this. Then %u instruction printf to print an unsigned int, but you pass it cc, which is a pointer. The behavior of this is not defined by the C standard. printf apparently does what it can in this situation. You should instead print the pointer by using %p for the conversion specification and (void *) cc for the argument.

cc = &l; sets cc to point to the first byte of l, as with i above.

In printf("\nc: %ld, cc: %u", *cc, cc);, *cc is the first byte of l, which is AB16. In your C implementation, char is signed and two’s complement, so the bits of AB16 are interpreted as −85. This is automatically provided to an int with value −85, which is represented with the bits of FFFFFFAB16. However, %ld instructs printf to expect a long int, where you have only passed an int. What may have happened here is that some 32 neighboring zero bits (possibly in the same register used to pass the bits FFFFFFAB16) may have been used with the 32 bits FFFFFFAB16 to form the bits 00000000FFFFFFAB16, and printf used that as a long int. As a long int, those bits represent 4,294,967,211, so printf printed “4294967211”.

cc = &f; sets cc to point to the first byte of f, as above.

In printf("\nc: %f, cc: %u", *cc, cc);, *cc is the first byte of f. It is some char value, which is again promoted to int, but %f instructs printf to expect a double. In your C implementation, double arguments and int arguments may be passed in different processor registers. So, when printf fetches bits from where it expects a double, it does not get any bits at all of the int you are passing. The bits it did get may have been all zeros, which represent zero in common floating-point encoding schemes, so printf printed “0.000000”.

Also in the output from that printf, we see “cc: 4294967235”. This value, 2,294,967,235, is different from the other values we saw printed for addresses, like 591,272,048. The reason for this is that, after printf tried to get a double value for %f, it then tried to get an unsigned argument for %u. Instead of getting it from the place where a second integer argument is passed, as in the previous printf calls, it got it from the place where a first integer argument is passed. Thus, it got the int value passed for *cc instead of the address passed for cc.

  • Related