char str[] = "Stanford University";
char a = str[1];
char b = *(char *)((int *)str 3);
char c = str[sizeof(int *)];
What are the char values of a
, b
and c
?
a = 't'
value of b
is 'v'
and c
is ' '
(space). But how come c
is space? size of int *
is 4 bytes or 8 bytes. We will have different values in both the cases. Also, in b
it is pointing to s[12]
but how the whole line is executed, i mean first it is typecasted by int *
and then by char *
and then we are dereferencing or we are doing something else?
CodePudding user response:
The first value is always 't'
as it is the second character in the C string stored in str
.
The second value depends on the size of int
on the target platforms. Hint: most modern platforms use 32-bit int
and 8-bit char
.
The third value depends on the size of pointers to int
. Hint: pointers can have a different size than int
, on modern platforms they usually have 64 bits.
Th values you observe on your platform at consistent with int
having a size of 4 bytes (32-bit) and int *
having a size of 8 bytes (64-bit). This is the case on current 64-bit systems.
Here is the explanation for the second expression:
To evaluate *(char *)((int *)str 3)
, the compiler first converts str
to a pointer to int
, which might be misaligned(*), then computes the address of the fourth int
in an array pointed to by (int *)str
, hence 12 bytes from the beginning of this array, then this address is converted back to char *
, keeping the same address. Finally, *
reads the character pointed to by the latter, hence str[3 * 4]
ie. the letter 'v'
.
The behavior is simpler to explain for the third expression:
char c = str[sizeof(int *)];
just reads the character at offset sizeof(int *)
which is 8
or your system, so c
contains ' '
, the space between Stanford
and University
.
Remember that both the second and third expressions are implementation defined:
- on ancien MS/DOS systems using small model, you would have
b = 'r'
andc = 'a'
, and using medium and large model,b = 'r'
andc = 'f'
; - on old 32-bit Windows, Mac and linux systems, you would have
b = 'v'
andc = 'f'
; - on some exotic Cray systems with 64-bit
int
, computingb
would have undefined behavior; - on some embedded DSP processors, you could even have
b = 'n'
andc = 't'
.
(*) the misaligned pointer will not be dereferenced as an int *
, but even just computing an invalid address has undefined behavior so something weird could happen on exotic systems. If your target is a personal computer running Windows, macOS or linux, this risky address computation should not pose a problem.