Trying to wrap my head around Chapters 5.7 to 5.9 in "The C Programming Language", handling with multi-dimensional arrays, arrays of pointers etc., I came up with the following observation:
If in the code below foo
is declared as a pointer to a char array, and I later want to assign a pointer of type char
to it, I must precede the string literal with a &
symbol. However, if I declare bar
as a pointer, the very same operation is possible without a &
symbol.
char (*foo)[3]; // creates a single pointer to a char array of size 3
char *bar; // creates a single pointer to char
int main()
{
foo = &"AB";
bar = "AB";
return 0;
}
The disassembly (64bit Macho-O) seems (at least to the beginner's eye) to perform the same operations for both assignments:
Disassembly of section __TEXT,__text:
0000000100003f80 _main:
100003f80: 55 push rbp
100003f81: 48 89 e5 mov rbp, rsp
100003f84: 31 c0 xor eax, eax
100003f86: 48 8d 0d 7b 00 00 00 lea rcx, [rip 123] // address of 'bar'
100003f8d: 48 8d 15 6c 00 00 00 lea rdx, [rip 108] // address of 'foo'
100003f94: c7 45 fc 00 00 00 00 mov dword ptr [rbp - 4], 0
100003f9b: 48 8d 35 08 00 00 00 lea rsi, [rip 8] // address pointing to 'AB'-String
100003fa2: 48 89 32 mov qword ptr [rdx], rsi // store address of 'AB' in 'foo'
100003fa5: 48 89 31 mov qword ptr [rcx], rsi // store address of 'AB' in 'bar'
100003fa8: 5d pop rbp
100003fa9: c3 ret
Disassembly of section __TEXT,__cstring:
0000000100003faa __cstring:
100003faa: 41 42 <unknown>
100003fac: 00 <unknown>
Disassembly of section __DATA,__common:
0000000100004000 _foo:
...
0000000100004008 _bar:
...
Since the book I am reading is my first contact to C, I'm afraid that I'm missing something obvious here. Wouldn't it be more logical, if I would need the &
-symbol in both cases?
CodePudding user response:
A string literal in C has type "array of char" with a size equal to the number of characters including the terminating null byte. This means "AB"
has type char [3]
.
In most cases, when an array is used in an expression it decays to a pointer to the first element. This is what happens in the case of bar = "AB"
. The string constant on the right side decays to type char *
which can be assigned directly to bar
.
One of the cases where this decay does not happen is when the array is the subject of the &
operator. So taking the address of an array of type char [3]
yields a pointer of type char (*)[3]
which matches the type of foo
.
CodePudding user response:
OK, lets assume that every instance of "AB"
is stored at memory location 0xDEADBEEF
. The expression "AB"
is an array expression, and in most context will return the address of its first element. Two notable exceptions are the &
and sizeof
operators. In &"AB"
and sizeof "AB"
the array would bee seen as an array, and not the address of its first element.
char (*foo)[3] = &"AB";
will (on most architectures) initialize foo
to have the value 0xDEADBEEF
which is the address of "AB"
. In the expression &"AB"
, the array "AB"
is treated as an array, and &
returns its address.
char *bar = "AB";
will initialize bar
to point at A
, and guess what, on most architectures that would be the same address as "AB"
, which is 0xDEADBEEF
. Here the array "AB"
was converted to a pointer to its first element.
So both foo
and bar
points at the same memory location, but at different objects. foo
points at a sequence of 3 bytes, but bar
points at a single byte.
Since foo
points at an array, the expression *foo
is the array foo
points at. In most context, array expressions yields a pointer to the first element in the array, so in the expression **foo
, the subexpression *foo
is the array "AB"
(of type char[3]
), but is converted to a pointer to A
(of type char *
). The second * in front of *foo
dereferences the pointer to A
, and returns A
To summarize:
foo
is a pointer tochar [3];
with the value0xDEAFBEEF
*foo
is the arrayfoo
points at, which is"AB"
and has typechar [3]
In most contexts,
*foo
gets converted tochar *
, still with the value0xDEADBEEF
As exceptions to previous point, in the expressions
&*foo
andsizeof *foo
,*foo
is not converted, but remains"AB"
.Dereferencing
*foo
like this**foo
, will returnA
(foo==0xDEADBEEF
,*foo=="AB"->0xDEADBEEF==bar
,**foo==*0xDEADBEEF==*bar=='A'
)