I was experimenting with C and found out that const char*
and const char[]
behave very differently with the following code. Really sorry if I did not phrase this question very well as I am not clear of what is happening in the code.
#include <iostream>
#include <vector>
// This version uses <const char[3]> for <myStr>.
// It does not work as expected.
struct StrStruct
{
const char myStr[3];
};
// This program extracts all the string elements in <strStructList> and copy them to <strListCopy>
int main()
{
StrStruct strStruct1{"ab"};
StrStruct strStruct2{"de"};
StrStruct strStruct3{"ga"};
std::vector<StrStruct> strStructList{strStruct1, strStruct2, strStruct3};
std::vector<const char*> strListCopy{};
for (StrStruct strStructEle : strStructList)
{
strListCopy.push_back(strStructEle.myStr);
std::cout << "Memory address for the string got pushed back in is "
<< &strStructEle.myStr << std::endl;
std::cout << "Memory address for the first element of the string got pushed back in is "
<< (void *) &strStructEle.myStr[0] << "\n" <<std::endl;
}
std::cout << "Show content of <strListCopy>:" << std::endl;
for (const char*& strEle : strListCopy)
{
std::cout << strEle << std::endl;
}
}
The following is its output:
Memory address for the string got pushed back in is [address#99]
Memory address for the first element of the string got pushed back in is [address#99]
Memory address for the string got pushed back in is [address#99]
Memory address for the first element of the string got pushed back in is [address#99]
Memory address for the string got pushed back in is [address#99]
Memory address for the first element of the string got pushed back in is [address#99]
Show content of <strListCopy>:
ga
ga
ga
However, if I just simply change the implementation for StrStruct
from:
// This version uses <const char[3]> for <myStr>.
// It does not work as expected.
struct StrStruct
{
const char myStr[3];
};
to
// This version uses <const char*> for <myStr>.
// It works as expected.
struct StrStruct
{
const char* myStr;
};
Program's output becomes this:
Memory address for the string got pushed back in is [address#10]
Memory address for the first element of the string got pushed back in is [address#1]
Memory address for the string got pushed back in is [address#10]
Memory address for the first element of the string got pushed back in is [address#2]
Memory address for the string got pushed back in is [address#10]
Memory address for the first element of the string got pushed back in is [address#3]
Show content of <strListCopy>:
ab
de
ga
What confuses me is the following:
Why in the first version all the strings have the same value? I tried to use
const strStruct&
instead ofstrStruct
in the for each loop which solves the problem but I do not understand how.Why do
const char*
andconst char[]
behave so differently? I thought they are largely the same due to the following:
const char myChars[] = "abcde";
const char* myCharsCopy = myChars;
std::cout << myChars << " vs " << myCharsCopy << std::endl;
It prints out abcde vs abcde
and you can directly assign value of const char[]
to const char*
without any error.
- Why does changing
const char[]
toconst char*
solves the problem?
CodePudding user response:
Fundamentals necessary to understand the rest:
Arrays and Decay
struct StrStruct
{
const char myStr[3];
};
contains the data
struct StrStruct
{
const char * myStr;
};
points at the data.
Arrays decay to pointers but are not pointers themselves.
const char myChars[] = "abcde";
makes an array of exactly the right size (six characters, five letters and the null terminator) to hold "abcde"
and copies the string into the array. Note that this need not be const
.
const char* myCharsCopy = myChars;
defines a pointer to a char
and assigns to it the array myChars
. myChars
automatically decays to a pointer in the process. myCharsCopy
is not a copy of myChars
; it merely holds the address of myChars
. Note that so long as myChars
is const
, myCharsCopy
must be const
. Also note that you cannot assign to an array and that it's next to impossible to copy an array unless you place it inside another data structure. The best you can normally do is copy what's in the array to another array (memcpy
or strcpy
depending on the goal and whether or not the array is a null terminated character array).
Note that in many uses, a function parameter for example, const char[]
and const char*
mean the same thing.
void func(const char a[], // accepts constant pointer to char
const char * b) // also accepts constant pointer to char
This stuff gets really weird for reasons that (mostly) made great sense back in the 1970s. Today I strongly recommend you use library containers like std::vector
and std::array
instead of raw arrays.
Range-based for Loops
Range-based for loops operate on copies of the items in the list unless you specify otherwise. In the body of
for (StrStruct strStructEle : strStructList)
on the first iteration of the loop, strStructEle
is not strStruct1
or even the copy of strStructEle
that sits in strStructList
, it is a third identical object. The copy is destroyed at the end of the body, freeing up the storage used.
With
for (StrStruct & strStructEle : strStructList)
the loop will operate on references to the items in strStructList
, so no copies are made.
Now that you're up to speed...
Point 1
Why in the first version all the strings have the same value? I tried to use const strStruct& instead of strStruct in the for each loop which solves the problem but I do not understand how.
Since
struct StrStruct
{
const char myStr[3];
};
contains the data when you make a copy of a StrStruct
. The code copies the data structure here
std::vector<StrStruct> strStructList{strStruct1, strStruct2, strStruct3};
and, more importantly to the output, here
for (StrStruct strStructEle : strStructList) // strStructEle copied, so data in it is copied
{
strListCopy.push_back(strStructEle.myStr); //strStructEle.myStr decays to pointer,
// and pointer is stored in strListCopy
// this is not really a copy it's a pointer
// to data stored elsewhere
std::cout << "Memory address for the string got pushed back in is "
<< &strStructEle.myStr << std::endl; // print address of array
std::cout << "Memory address for the first element of the string got pushed back in is "
<< (void *) &strStructEle.myStr[0] << "\n" <<std::endl;
// prints address of the first item in the array, the same as the array
} // strStructEle is destroyed here, so the stored pointer is now invalid.
// Technically anything can happen at this point
But in this case the anything that could happen appears to be the storage is reused for the strStructEle
in the next iteration of the loop. This is why all the stored pointers appear to be the same. They ARE the same. They are different objects that all resided in the same location at different points in time. All of these objects have expired, so attempting to so much as look at them is not a good idea.
const strStruct&
"fixes" the problem because no copy is made. Each iteration operates on a different object at a different location rather than a different object at the same location.
Point 3
Why does changing
const char[]
toconst char*
solves the problem?
If myStr
is a pointer rather than an array, things are different
for (StrStruct strStructEle : strStructList) // strStructEle copied, so data in it is copied
// BUT! The data in it is a pointer to data
// stored elsewhere that is NOT copied
{
strListCopy.push_back(strStructEle.myStr); //strStructEle.myStr is a pointer and is
// directly stored in strListCopy
// this is still not a copy
std::cout << "Memory address for the string got pushed back in is "
<< &strStructEle.myStr << std::endl; // print address of pointer, not what
// it points at
std::cout << "Memory address for the first element of the string got pushed back in is "
<< (void *) &strStructEle.myStr[0] << "\n" <<std::endl;
// prints address of the first item pointed to by the pointer,
// and will be a totally different address
}
Point 2
Why do const char* and const char[] behave so differently? I thought they are largely the same due to the following...
This is the result of array decay as explained above.
An aside:
vector
s are at their best when they are allowed to directly contain (and own) the data they're collecting. They handle all of the memory management and they keep the data together in one nice, easily cached block.
CodePudding user response:
Based on Yakov Galka and user4581301 in the comment section
Few things to clear out before answer:
Difference between const char*
and const char[]
:
Conceptually:
const char*
is the pointer to a const char
const char[]
is the character array itself.
In Terms of Code:
const char*
stores a memory address, and, its own memory address is different from the one that it stores.
const char[]
stores the memory address of the first element in the array, and, its own memory address is the same as the one that it stores.
const char myCharsArray[] = "abcde"; // Writing like this guarrentees you have an null terminator at the end
const char* myCharsPointer = "qwert\0";
std::cout << "The memory address for <myCharsArray> is "
<< &myCharsArray
<< std::endl;;
std::cout << "The memory address for the first element in <myCharArray> is "
<< (void *) &myCharsArray[0]
<< std::endl;
std::cout << "The memory address for <myCharsPointer> is "
<< &myCharsPointer
<< std::endl;
std::cout << "The memory address for the first element in <myCharsPointer> is "
<< (void *) &myCharsPointer[0]
<< std::endl;
Its output is this:
The memory address for <myCharsArray> is [address#10]
The memory address for the first element in <myCharArray> is [address#10]
The memory address for <myCharsPointer> is [address#88]
The memory address for the first element in <myCharsPointer> is [address#99]
To answer those three questions:
Question 1:
In the first version, std::vector::push_back
keeps adding the address of the first element in the character array it copied, which is also the address for strStructEle.myStr
itself which never changes. In the end, the list is a bunch of memory addresses whose values are exactly the same.
By using const strStruct&
, reference to the original content is used. Thus, their unique and true memory addresses got copied in to the list.
Question 2:
Difference as explained above.
Question 3:
It allows the original memory address of the original character array to be passed around, instead of copy the content of an original character array and then the memory address of the temporary object.