Home > Blockchain >  Does the treatment of string literals depend on the left hand side of an assignment?
Does the treatment of string literals depend on the left hand side of an assignment?

Time:09-17

Trying to wrap my head around Chapters 5.7 to 5.9 in "The C Programming Language", handling with multi-dimensional arrays, arrays of pointers etc., I came up with the following observation:

If in the code below foo is declared as a pointer to a char array, and I later want to assign a pointer of type char to it, I must precede the string literal with a & symbol. However, if I declare bar as a pointer, the very same operation is possible without a & symbol.

char (*foo)[3]; // creates a single pointer to a char array of size 3
char *bar;      // creates a single pointer to char

int main()
{
    foo = &"AB";
    bar = "AB";
    
    return 0;
}

The disassembly (64bit Macho-O) seems (at least to the beginner's eye) to perform the same operations for both assignments:

Disassembly of section __TEXT,__text:

0000000100003f80 _main:
100003f80: 55                           push    rbp
100003f81: 48 89 e5                     mov rbp, rsp
100003f84: 31 c0                        xor eax, eax
100003f86: 48 8d 0d 7b 00 00 00         lea rcx, [rip   123]        // address of 'bar'
100003f8d: 48 8d 15 6c 00 00 00         lea rdx, [rip   108]        // address of 'foo'
100003f94: c7 45 fc 00 00 00 00         mov dword ptr [rbp - 4], 0
100003f9b: 48 8d 35 08 00 00 00         lea rsi, [rip   8]          // address pointing to 'AB'-String
100003fa2: 48 89 32                     mov qword ptr [rdx], rsi    // store address of 'AB' in 'foo'
100003fa5: 48 89 31                     mov qword ptr [rcx], rsi    // store address of 'AB' in 'bar'
100003fa8: 5d                           pop rbp
100003fa9: c3                           ret

Disassembly of section __TEXT,__cstring:

0000000100003faa __cstring:
100003faa: 41 42                        <unknown>
100003fac: 00                           <unknown>


Disassembly of section __DATA,__common:

0000000100004000 _foo:
...

0000000100004008 _bar:
...

Since the book I am reading is my first contact to C, I'm afraid that I'm missing something obvious here. Wouldn't it be more logical, if I would need the &-symbol in both cases?

CodePudding user response:

A string literal in C has type "array of char" with a size equal to the number of characters including the terminating null byte. This means "AB" has type char [3].

In most cases, when an array is used in an expression it decays to a pointer to the first element. This is what happens in the case of bar = "AB". The string constant on the right side decays to type char * which can be assigned directly to bar.

One of the cases where this decay does not happen is when the array is the subject of the & operator. So taking the address of an array of type char [3] yields a pointer of type char (*)[3] which matches the type of foo.

CodePudding user response:

OK, lets assume that every instance of "AB" is stored at memory location 0xDEADBEEF. The expression "AB" is an array expression, and in most context will return the address of its first element. Two notable exceptions are the & and sizeof operators. In &"AB" and sizeof "AB" the array would bee seen as an array, and not the address of its first element.

char (*foo)[3] = &"AB"; will (on most architectures) initialize foo to have the value 0xDEADBEEF which is the address of "AB". In the expression &"AB", the array "AB" is treated as an array, and & returns its address.

char *bar = "AB"; will initialize bar to point at A, and guess what, on most architectures that would be the same address as "AB", which is 0xDEADBEEF. Here the array "AB" was converted to a pointer to its first element.

So both foo and bar points at the same memory location, but at different objects. foo points at a sequence of 3 bytes, but bar points at a single byte.

Since foo points at an array, the expression *foo is the array foo points at. In most context, array expressions yields a pointer to the first element in the array, so in the expression **foo, the subexpression *foo is the array "AB" (of type char[3]), but is converted to a pointer to A (of type char *). The second * in front of *foo dereferences the pointer to A, and returns A

To summarize:

  • foo is a pointer to char [3]; with the value 0xDEAFBEEF

  • *foo is the array foo points at, which is "AB" and has type char [3]

  • In most contexts, *foo gets converted to char *, still with the value 0xDEADBEEF

  • As exceptions to previous point, in the expressions &*foo and sizeof *foo, *foo is not converted, but remains "AB".

  • Dereferencing *foo like this **foo, will return A (foo==0xDEADBEEF, *foo=="AB"->0xDEADBEEF==bar, **foo==*0xDEADBEEF==*bar=='A')

  • Related