Home > OS >  Why do <const char*> and <const char[]> have very different memory or pointer behaviour?
Why do <const char*> and <const char[]> have very different memory or pointer behaviour?

Time:11-25

I was experimenting with C and found out that const char* and const char[] behave very differently with the following code. Really sorry if I did not phrase this question very well as I am not clear of what is happening in the code.

#include <iostream>                                                                                                        
#include <vector>                                                                                                          

// This version uses <const char[3]> for <myStr>.
// It does not work as expected.                                                                                                                      
struct StrStruct                                                                                                           
{                                                                                                                  
    const char myStr[3];                                                                                                     
};                                                                                                                         
       
// This program extracts all the string elements in <strStructList> and copy them to <strListCopy>                                                                      
int main()
{
    StrStruct strStruct1{"ab"};
    StrStruct strStruct2{"de"};
    StrStruct strStruct3{"ga"};
                                                                                                                           
    std::vector<StrStruct> strStructList{strStruct1, strStruct2, strStruct3};
    std::vector<const char*>  strListCopy{};
                                                                                                                           
    for (StrStruct strStructEle : strStructList)                                                                           
    {                                                                                                                      
        strListCopy.push_back(strStructEle.myStr);                                                                         
                                                                                                                           
        std::cout << "Memory address for the string got pushed back in is "                                                
                  << &strStructEle.myStr << std::endl;                                                                     
        std::cout << "Memory address for the first element of the string got pushed back in is "                           
                  << (void *) &strStructEle.myStr[0] << "\n" <<std::endl;                                                          
    }                                                                                                                      
    
    std::cout << "Show content of <strListCopy>:" << std::endl;                                                                                                                     
    for (const char*& strEle : strListCopy)                                                                                
    {                                                                                                                      
        std::cout << strEle << std::endl;                                                                                  
    }                                                                                                                                                                                            
}

The following is its output:

Memory address for the string got pushed back in is [address#99]
Memory address for the first element of the string got pushed back in is [address#99]

Memory address for the string got pushed back in is [address#99]
Memory address for the first element of the string got pushed back in is [address#99]

Memory address for the string got pushed back in is [address#99]
Memory address for the first element of the string got pushed back in is [address#99]

Show content of <strListCopy>:
ga
ga
ga

However, if I just simply change the implementation for StrStruct

from:

// This version uses <const char[3]> for <myStr>.
// It does not work as expected. 
struct StrStruct                                                                                                           
{                                                                                                                  
    const char myStr[3];                                                                                                     
};

to

// This version uses <const char*> for <myStr>.
// It works as expected.                                                                                                                      
struct StrStruct                                                                                                           
{                                                                                                                  
    const char* myStr;                                                                                                     
};

Program's output becomes this:

Memory address for the string got pushed back in is [address#10]
Memory address for the first element of the string got pushed back in is [address#1]

Memory address for the string got pushed back in is [address#10]
Memory address for the first element of the string got pushed back in is [address#2]

Memory address for the string got pushed back in is [address#10]
Memory address for the first element of the string got pushed back in is [address#3]

Show content of <strListCopy>:
ab
de
ga

What confuses me is the following:

  1. Why in the first version all the strings have the same value? I tried to use const strStruct& instead of strStruct in the for each loop which solves the problem but I do not understand how.

  2. Why do const char* and const char[] behave so differently? I thought they are largely the same due to the following:

const char myChars[] = "abcde";                                                                                     
const char* myCharsCopy = myChars;                                                                                  
                                                                                                                        
std::cout << myChars << " vs "  << myCharsCopy << std::endl;  

It prints out abcde vs abcde and you can directly assign value of const char[] to const char* without any error.

  1. Why does changing const char[] to const char* solves the problem?

CodePudding user response:

Fundamentals necessary to understand the rest:

Arrays and Decay

struct StrStruct
{
    const char myStr[3];
};

contains the data

struct StrStruct
{
    const char * myStr;
};

points at the data.

Arrays decay to pointers but are not pointers themselves.

const char myChars[] = "abcde";

makes an array of exactly the right size (six characters, five letters and the null terminator) to hold "abcde" and copies the string into the array. Note that this need not be const.

const char* myCharsCopy = myChars;

defines a pointer to a char and assigns to it the array myChars. myChars automatically decays to a pointer in the process. myCharsCopy is not a copy of myChars; it merely holds the address of myChars. Note that so long as myChars is const, myCharsCopy must be const. Also note that you cannot assign to an array and that it's next to impossible to copy an array unless you place it inside another data structure. The best you can normally do is copy what's in the array to another array (memcpy or strcpy depending on the goal and whether or not the array is a null terminated character array).

Note that in many uses, a function parameter for example, const char[] and const char* mean the same thing.

void func(const char a[], // accepts constant pointer to char
          const char * b) // also accepts constant pointer to char

This stuff gets really weird for reasons that (mostly) made great sense back in the 1970s. Today I strongly recommend you use library containers like std::vector and std::array instead of raw arrays.

Range-based for Loops

Range-based for loops operate on copies of the items in the list unless you specify otherwise. In the body of

for (StrStruct strStructEle : strStructList)

on the first iteration of the loop, strStructEle is not strStruct1 or even the copy of strStructEle that sits in strStructList, it is a third identical object. The copy is destroyed at the end of the body, freeing up the storage used.

With

for (StrStruct & strStructEle : strStructList)

the loop will operate on references to the items in strStructList, so no copies are made.

Now that you're up to speed...

Point 1

Why in the first version all the strings have the same value? I tried to use const strStruct& instead of strStruct in the for each loop which solves the problem but I do not understand how.

Since

struct StrStruct
{
    const char myStr[3];
};

contains the data when you make a copy of a StrStruct. The code copies the data structure here

std::vector<StrStruct> strStructList{strStruct1, strStruct2, strStruct3};

and, more importantly to the output, here

for (StrStruct strStructEle : strStructList) // strStructEle copied, so data in it is copied
{
    strListCopy.push_back(strStructEle.myStr); //strStructEle.myStr decays to pointer, 
                                               // and pointer is stored in strListCopy
                                               // this is not really a copy it's a pointer 
                                               // to data stored elsewhere
    std::cout << "Memory address for the string got pushed back in is "
              << &strStructEle.myStr << std::endl; // print address of array
    std::cout << "Memory address for the first element of the string got pushed back in is "
              << (void *) &strStructEle.myStr[0] << "\n" <<std::endl;
                  // prints address of the first item in the array, the same as the array
} // strStructEle is destroyed here, so the stored pointer is now invalid. 
  // Technically anything can happen at this point

But in this case the anything that could happen appears to be the storage is reused for the strStructEle in the next iteration of the loop. This is why all the stored pointers appear to be the same. They ARE the same. They are different objects that all resided in the same location at different points in time. All of these objects have expired, so attempting to so much as look at them is not a good idea.

const strStruct& "fixes" the problem because no copy is made. Each iteration operates on a different object at a different location rather than a different object at the same location.

Point 3

Why does changing const char[] to const char* solves the problem?

If myStr is a pointer rather than an array, things are different

for (StrStruct strStructEle : strStructList) // strStructEle copied, so data in it is copied
                                             // BUT! The data in it is a pointer to data 
                                             // stored elsewhere that is NOT copied
{
    strListCopy.push_back(strStructEle.myStr); //strStructEle.myStr is a pointer and is 
                                               // directly stored in strListCopy
                                               // this is still not a copy 
    std::cout << "Memory address for the string got pushed back in is "
              << &strStructEle.myStr << std::endl; // print address of pointer, not what 
                                                   // it points at
    std::cout << "Memory address for the first element of the string got pushed back in is "
              << (void *) &strStructEle.myStr[0] << "\n" <<std::endl;
                  // prints address of the first item pointed to by the pointer, 
                  // and will be a totally different address
}

Point 2

Why do const char* and const char[] behave so differently? I thought they are largely the same due to the following...

This is the result of array decay as explained above.

An aside:

vectors are at their best when they are allowed to directly contain (and own) the data they're collecting. They handle all of the memory management and they keep the data together in one nice, easily cached block.

CodePudding user response:

Based on Yakov Galka and user4581301 in the comment section

Few things to clear out before answer:

Difference between const char* and const char[]:

Conceptually:

const char* is the pointer to a const char

const char[] is the character array itself.

In Terms of Code:

const char* stores a memory address, and, its own memory address is different from the one that it stores.

const char[] stores the memory address of the first element in the array, and, its own memory address is the same as the one that it stores.

const char myCharsArray[] = "abcde";      // Writing like this guarrentees you have an null terminator at the end   
const char* myCharsPointer = "qwert\0";                                                                                
                                                                                                                           
std::cout << "The memory address for <myCharsArray> is "                                                               
              << &myCharsArray                                                                                          
              << std::endl;;                                                                                            
                                                                                                                        
std::cout << "The memory address for the first element in <myCharArray> is "                                        
              << (void *) &myCharsArray[0]                                                                              
              << std::endl;                                                                                             
                                                                                                                           
                                                                                                                        
std::cout << "The memory address for <myCharsPointer> is "                                                          
              << &myCharsPointer 
              << std::endl;                                                                          
                                                                                                                        
std::cout << "The memory address for the first element in <myCharsPointer> is "                                     
              << (void *) &myCharsPointer[0]                                                                            
              << std::endl;

Its output is this:

The memory address for <myCharsArray> is [address#10]
The memory address for the first element in <myCharArray> is [address#10]
The memory address for <myCharsPointer> is [address#88]
The memory address for the first element in <myCharsPointer> is [address#99]

To answer those three questions:

Question 1:

In the first version, std::vector::push_back keeps adding the address of the first element in the character array it copied, which is also the address for strStructEle.myStr itself which never changes. In the end, the list is a bunch of memory addresses whose values are exactly the same.

By using const strStruct&, reference to the original content is used. Thus, their unique and true memory addresses got copied in to the list.

Question 2:

Difference as explained above.

Question 3:

It allows the original memory address of the original character array to be passed around, instead of copy the content of an original character array and then the memory address of the temporary object.

  • Related