Home > Net >  What value does this line store char c = str[sizeof(int*)] ;?
What value does this line store char c = str[sizeof(int*)] ;?

Time:04-07

char str[] = "Stanford University";
char a = str[1];
char b = *(char *)((int *)str   3);
char c = str[sizeof(int *)];
 

What are the char values of a, b and c?

a = 't' 

value of b is 'v' and c is ' ' (space). But how come c is space? size of int * is 4 bytes or 8 bytes. We will have different values in both the cases. Also, in b it is pointing to s[12] but how the whole line is executed, i mean first it is typecasted by int * and then by char * and then we are dereferencing or we are doing something else?

CodePudding user response:

The first value is always 't' as it is the second character in the C string stored in str.

The second value depends on the size of int on the target platforms. Hint: most modern platforms use 32-bit int and 8-bit char.

The third value depends on the size of pointers to int. Hint: pointers can have a different size than int, on modern platforms they usually have 64 bits.

Th values you observe on your platform at consistent with int having a size of 4 bytes (32-bit) and int * having a size of 8 bytes (64-bit). This is the case on current 64-bit systems.

Here is the explanation for the second expression:

To evaluate *(char *)((int *)str 3), the compiler first converts str to a pointer to int, which might be misaligned(*), then computes the address of the fourth int in an array pointed to by (int *)str, hence 12 bytes from the beginning of this array, then this address is converted back to char *, keeping the same address. Finally, * reads the character pointed to by the latter, hence str[3 * 4] ie. the letter 'v'.

The behavior is simpler to explain for the third expression:

char c = str[sizeof(int *)]; just reads the character at offset sizeof(int *) which is 8 or your system, so c contains ' ', the space between Stanford and University.

Remember that both the second and third expressions are implementation defined:

  • on ancien MS/DOS systems using small model, you would have b = 'r' and c = 'a', and using medium and large model, b = 'r' and c = 'f';
  • on old 32-bit Windows, Mac and linux systems, you would have b = 'v' and c = 'f';
  • on some exotic Cray systems with 64-bit int, computing b would have undefined behavior;
  • on some embedded DSP processors, you could even have b = 'n' and c = 't'.

(*) the misaligned pointer will not be dereferenced as an int *, but even just computing an invalid address has undefined behavior so something weird could happen on exotic systems. If your target is a personal computer running Windows, macOS or linux, this risky address computation should not pose a problem.

  • Related