Why does dereferencing a char** value (pointer-to-pointer-to-char) differ from dereferencing a char*-CodePudding

If a char* array is defined with at least one value, the first value can be indexed via "arr[0]" and printed successfully with the format specifier "%s". However, if a char** is defined, it will segfault if the first element is attempted to be printed with printf or similar functions (with string format specifier).

Below is a block of code that demonstrates how this works:

char* s = "test";
char** sa = { s };
char* arr[] = { s };

// Prints pointers of each variable
puts("\nTest 1:");
puts("----------------------------------------");
printf("s:\t%p\n", s);
printf("sa:\t%p\n\n", sa);

// Dereferences pointers, prints as pointers (even though *s is a char)
puts("Test 2:");
puts("----------------------------------------");
printf("s:\t%p\n", *s);
printf("sa:\t%p\n\n", *sa);

// Prints strings of each variable
puts("Test 3:");
puts("----------------------------------------");
printf("s:\t%s\n", s);
printf("sa:\t%s\n\n", sa);

// Prints first element of array (works unlike *(char**), which I would think are the same)
puts("Test 4:");
puts("----------------------------------------");
printf("arr:\t%s\n", arr[0]);
printf("arr:\t%s\n\n", *arr);

// Prints strings of dereferenced pointers (segfault for sa)
puts("Test 5:");
puts("----------------------------------------");
printf("s:\t%s\n", *s);
printf("sa:\t%s\n\n", *sa);

This is the potential output:

Test 1:
----------------------------------------
s:      0x10013bf2c
sa:     0x10013bf2c

Test 2:
----------------------------------------
s:      0x74
sa:     0x65540a0074736574

Test 3:
----------------------------------------
s:      test
sa:     test

Test 4:
----------------------------------------
arr:    test
arr:    test

Test 5:
----------------------------------------
zsh: segmentation fault  ./test2

I understand that dereferencing variable s returns the ASCII value for the first character, which is "t" in this case. However, I'm confused on why variable sa doesn't dereference to the first string in the array? Shouldn't dereferencing the double pointer return the pointer to s?

I've tried returning strings from the pointer sa returns, but it only segfaults, meaning that the pointer for sa under Test 2 has no meaning or association with the string bounded to s.

CodePudding user response：

For starters you do not have an array. In this declaration

char** sa = { s };

you declared a pointer of the type char ** that is initialized by the value of the variable s of the type char *.

char* s = "test";
char** sa = { s };

The compiler should issue a message that there is no implicit conversion between these two pointer types.

The difference between these two outputs

printf("s:\t%p\n", *s);
printf("sa:\t%p\n\n", *sa);

is that in the first case the expression *s has the type char that is promoted to the type int. That is only the first byte of the string literal is read as a value.

In the second case the expression *sa is considered as an expression of the pointer type char *. So the function reads the memory occupied by the expression as a memory storing a pointer of the size 4 or 8 bytes depending on the used system without any promotions as in the first case.

As for your comment to my answer

Also, when the char* version of sa (when sa is dereferenced) is attempted to be printed as a string using the format specifier "%s", a segfault occurs. This is the main source of what I'm a bit confused about

When the pointer sa is dereferenced then its value is the value composed by characters stored in the string literal and this value is interpreted as a pointer when the conversion specifier %s is used. That of course results in undefined behavior.

See the output of the code you showed in your question

sa:     0x65540a0074736574

This part of the outputted value 74736574 represents the word "test" written in the reverse order "tset".

You could write instead

char* s = "test";
char** sa = { &s };

and then

printf( "%s\n", *sa );

In this case the pointer sa will contain the address of the declared pointer s and the expression *sa yields the value stored in the pointer s that is the address of the first character of the string literal "test".

CodePudding user response：

The issue is that char ** and char * types are not compatible - as you can see with plain eye one has more indirections that need to be applied in order to retrieve the original char object.

In C you can use the initialiser syntax ({ and }) to assign even to non array types so char** sa = { s }; is equivalent to just a simple assignment with =.

Then furthermore you are then accessing the s pointer by doing sa which is then indirected with *sa and here we have potential UB because *sa is yielding an object that is not guaranteed to have valid location.

The UB actually happens when printf tries to access said location

It's because (for example in 32 bit architecture) *sa is reading 4 bytes of the s string and interpreting them as address (depending on endianness).

The difference between:

char** sa = { s };
char* arr[] = { s };

Is the fact that in the first case { and } are superfluous and doesn't mean nothing while in the second case they refer to the fact that we are initialising nested elements of the array (i.e. char * with s which is also char *). This would create (for the second case) an array with a single element.

There is difference between array and pointer.