Is a char array more efficient than a char pointer in C?-CodePudding

I'm trying to understand the under-the-hood difference between these two char declarations:

char* str1;
char[10] str2;

I understand that using char* gives a pointer to the first character in str1, while char[10] results in an array with a length of 10 bytes (because char is a single byte... I recall that in some encodings it can be more but let's just assume one byte to keep it simple).

My question is when actually assigning values to them the data obviously has to be stored somewhere and in the case of char[10] we're telling the compiler upfront to allocate ten bytes, whereas in the case of char* we're really just saying allocate a pointer to a single byte. But what happens if the string we assign to str1 is more than a single byte, how is that allocated? How much more work is needed to appropriately allocate that? Plus, what happens if we want to reassign str1 to be something longer than what was previously assigned, how is that allocated?

Because of the uncertainty from the compiler's point of view when dealing with char pointers, is it more efficient to use a char array when I either know the length ahead of time or want to limit the length to start with?

CodePudding user response：

char* str1; is declaring str1 as a pointer to a char data type. It doesn't allocate memory for a byte. But compiler allocates sizeof(char*) bytes for this variable.

str1 can be used to point to any char * data type. For example, a string literal or a char array terminated with \0.

I don't know what do you mean by Is a char array more efficient than a char pointer. Both data types are different and have different use cases. Asking this question sounds like asking Is an int type more efficient than a double type? It doesn't make any sense.

On the other hand, char[10] str2; is not a valid C syntax. I guess you mean char str2[10]; and this declares str2 as an array of 10 char. This variable can store 10 char data types.

str1 and str2 are two different data types.

CodePudding user response：

When discussing performance in general, allocation, access time and copy time separate things. You seem mostly concerned about allocation.

But there are lots of misconceptions here. Arrays are used for storing. Pointers are used to point at things stored elsewhere. You cannot store any data in a pointer, you can only store an address to data allocated elsewhere.

So comparing pointers or arrays is pretty much nonsense, because they are separate things. Similar to "should I live in my house at a street address or should I live in the sign stating my street address".

I understand that using char* gives a pointer to the first character in str1

No, it gives a pointer to a single character which is allocated somewhere else. Though it doesn't point anywhere meaningful until you assign an address to it. In case of arrays, it will typically get set to point at the first character of the array.

I recall that in some encodings it can be more

No, a character is per definition always 1 byte. Some exotic systems might have 16 bits per bytes or such though. This is of no concern unless you program exotic DSPs and the like. As for other character encodings, there's wchar_t which is a different topic entirely.

whereas in the case of char* we're really just saying allocate a pointer to a single byte

No, we tell it to allocate room for the pointer itself. Which is typically of a size between 2 to 8 bytes depending on address bus width of the specific system.

But what happens if the string we assign to str1 is more than a single byte, how is that allocated?

However you like. You can assign it to a read-only string literal, or a static storage duration variable, or a local automatic storage variable, or dynamically allocated variables. The pointer itself doesn't know or care.

How much more work is needed to appropriately allocate that?

It depends on what you want to allocate.

Because of the uncertainty from the compiler's point of view when dealing with char pointers

What uncertainty is that? Pointers are pointers and the compiler don't treat them much differently than other variables.

is it more efficient to use a char array when I either know the length ahead of time or want to limit the length to start with?

You need to use an array, because data cannot be stored in thin air. Again, data cannot be stored "in pointers".

CodePudding user response：

But what happens if the string we assign to str1 is more than a single byte, how is that allocated?

str1 ultimately has to point to another array of char - whether it's allocated automatically, such as

char buffer[10];
char *str1 = buffer; // equivalent to &buffer[0]

or dynamically:

char *str1 = malloc( sizeof *str1 * 10 );

or through some other method. All str1 stores is the address of a char object somewhere in memory. You're not actually saving anything to str1, you're saving it to whatever str1 points to. Assume the following declarations:

char *str;
char buffer[10];

We have something like this in memory:

      char *            char
       ---               --- 
 str: | ? |     buffer: | ? | buffer[0]
       ---               --- 
                        | ? | buffer[1]
                         --- 
                         ...
                         --- 
                        | ? | buffer[9]
                         ---

First, we assign the address of the first element of buffer to str¹:

str = buffer;

Now our picture looks like this:

      char *            char
       ---               --- 
 str: |   | --> buffer: | ? | buffer[0]
       ---               --- 
                        | ? | buffer[1]
                         --- 
                         ...
                         --- 
                        | ? | buffer[9]
                         ---

Now we can store a string in buffer using str:

strcpy( str, "foo" );

giving us

      char *            char
       ---               --- 
 str: |   | --> buffer: |'f'| buffer[0]
       ---               --- 
                        |'o'| buffer[1]
                         --- 
                        |'o'| buffer[2] 
                         --- 
                        | 0 | buffer[3]
                         --- 
                         ...
                         --- 
                        | ? | buffer[9]
                         ---

"So," you're asking yourself, "why do we bother with the pointer? Why not just store the string to buffer directly? Wouldn't that be more efficient?"

Normally, we would just store to buffer directly and avoid the overhead of the pointer if that was an option. We work through pointers in the following situations:

The array was allocated dynamically - in this case we have no option but to go through a pointer:
```
char *str = malloc( sizeof *str * 10 );
strcpy( str, "foo" );
```
The array was passed as an argument to a function - because of the decay rule, when a you pass an array as a function argument what the function actually receives is a pointer to the first element (this is true of all array types, not just character arrays):
```
void foo( char *str, size_t max_size )
{
  strncpy( str, "this is a test", max_size );
  str[max_size-1] = 0;
}
```
We're using a pointer to iterate through an array of char [] or char *:
```
char table[][10] = { "foo", "bar", "bletch", "blurga", "" };
...
char *p = table[0];
while ( strlen( p ) )
  printf( "%s\n", p   );
...
```
Of course, we could just use array notation and not bother with the pointer at all:
```
size_t i = 0;
while ( strlen( table[i] ) )
  printf( "%s\n", table[i  ] );
```
Sometimes using array notation makes more sense, sometimes using a pointer makes more sense - depends on the problem at hand.

^{Unless it is the operand of the sizeof or unary & operator, or it is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of `T`" will be converted, or "decay", to an expression of type "pointer to `T`" and the value of the expression will be the address of the first element of the array.}

CodePudding user response：

Ok, let's go through your question piece by piece.

char* str1; //this is a pointer
char str2[10]; //that is an array of 10 characters
char[10] str2; //that is compilation error
               //possibly your mistook C for Java

I understand that using char* gives a pointer to the first character in str1

char* str1 is a pointer to a character. It can point to contiguous memory location, e.g. a C-style string, but doesn't have to. It might point to a single character as well.

while char[10] results in an array with a length of 10 bytes (because char is a single byte... I recall that in some encodings it can be more but let's just assume one byte to keep it simple).

Yes, that is correct. Depending on where it is defined, the array can be located on the stack, or in the data segment (if it's a global); that's an implementation detail though.

My question is when actually assigning values to them the data obviously has to be stored somewhere and in the case of char[10] we're telling the compiler upfront to allocate ten bytes (...)

That's generally right.

But what happens if the string we assign to str1 is more than a single byte, how is that allocated? How much more work is needed to appropriately allocate that? Plus, what happens if we want to reassign str1 to be something longer than what was previously assigned, how is that allocated?

It really depends on the case. Given the following case:

char s1[] = "foo";
char s2[] = "bar";
char* ptr;
ptr = &s1[1]; //points to first o
ptr = &s2[2]; //points to r

nothing is really allocated. Simply the contents of ptr changes, the same way an integer would. Note that it can be dereferenced/passed as a C-style string in this case.

However, in the following one, it cannot:

char c1 = 'a';
char c2 = 'b';
char* ptr;
ptr = &c1; //points to a
ptr = &c2; //points to b

Now, in case of immediate strings:

const char* s = "foo"; //should be const char* actually

the string is stored in the binary most likely as a global const and the s points to its start. A mental model for it might be similar to:

//globals
const char someCompilerGeneratedName[] = "foo";

//then the pointer:
const char* s = &someCompilerGeneratedName[0];

//Note that arrays decay to pointers, 
//i.e. array name denotes address of its 1st element
//the one below is equivalent:
const char* s = someCompilerGeneratedName;

Now, the pointer can post also to dynamically allocated memory. But it does not have to.

So the following code

char single = 'c';
char* c1 = malloc(10*sizeof(char));
char* c2;

c2 = c1;
c2 = &single;

is perfectly valid.

From performance standpoint: measure first. There is no easy answer here.

Now if you're asking about heap vs stack allocations, that's another story. But I'd say: measure first. Heap allocations are generally believed to be slower (often they are), but oftentimes their overhead is negligible anyway.

Also, keep in mind that

*(p 2) = //whatever else

is equivalent to:

p[2] = //whatever else

so sometimes it might be just the case of readability.

CodePudding user response：

Arrays used in expressions is implicitly converted (with rare exceptions) to pointers to their first elements. So for example if you write

char[10] str2 = "Hello"; char* str1 = str2;

then these class of puts

puts( str2 );
puts( str1 );

will be equivalently efficient the same way is to write

for ( size_t i = 0; str2[i] != '\0'; i   )
{
    putchar( str2[i] );
}

and

for ( size_t i = 0; str1[i] != '\0'; i   )
{
    putchar( str1[i] );
}

A difference can occur in these declarations

char[10] str2 = "Hello";
char* str1 = "Hello";

In the first case the array str2 is initialized by a string literal and you may change the stored string as for example

str2[0] = 'h';

In the second case the pointer str1 points to a string literal that has static storage duration and may not be changed. So if you will write

str1[0] = 'h';

then this statement will invoke undefined behavior.

On the other hand, if you will write the following function

char * f( void )
{
    char str2[10] = "Hello";
    return str2;
}

then the returned pointer will be invalid because the declared array will not be alive after exiting the function.

But this function

char * f( void )
{
    char* str1 = "Hello";
    return str1;
}

will be correct because the string literal having static storage duration will be alive after exiting the function.

Also if you will declare

char* str1 = "Hello";

then the expression sizeof( str1 ) will yield the size of the pointer that is equal to either 4 or 9 dependent on the used system.

But if you will write

char str2[10] = "Hello";

then the expression sizeof( str2 ) will yield the size of the array that is equal to 10.

However this function calls

strlen( str1 );

and

strlen( str2 );

equivalently effective and the both will return the value 5 that is the length of the string "Hello".

CodePudding user response：

There are enough answers explaining the miscellaneous aspects.

When C came first with the STL libraries, undermore string, we did have a speed and memory problem: really many strings.

So I made my own implementation of string with both:

char* ptr_to_actual_content;
char small_content[16];

ptr_to_actual_content = size < sizeof(small_content) ? small_content : malloc(size);

As now for small strings no extra allocation happened, the performance and speed gain was unbelievable huge. (By the way, NO memory leaks.)