Is it possible for separately initialized string variables to overlap?-CodePudding

If I initialize several string(character array) variables in the following ways:

const char* myString1 = "string content 1";
const char* myString2 = "string content 2";

Since const char* is simply a pointer a specific char object, it does not contain any size or range information of the character array it is pointing to.

So, is it possible for two string literals to overlap each other? (The newly allocated overlap the old one)

By overlap, I mean the following behaviour;

// Continue from the code block above
std::cout << myString1 << std::endl;
std::cout << myString2 << std::endl;

It outputs

string costring content 2
string content 2

So the start of myString2 is somewhere in the middle of myString1. Because const char* does not "protect"("possess") a range of memory locations but only that one it points to, I do not see how C can prevent other string literals from "landing" on the memory locations of the older ones.

How does C /compiler avoid such problem?

If I change const char* to const char[], is it still the same?

CodePudding user response：

A string is required to end with a null character having a value of 0, and can't have such a character in the middle. So the only case where this is even possible is when two strings are equal from the start of one to the end of both. That is not the case in the example you gave, so those two particular strings would never overlap.

Edit: sorry, I didn't mean to mislead anybody. It's actually easy to put a null character in the middle of a string with \0. But most string handling functions, particularly those in the standard library, will treat that as the end of a string - so your strings will get truncated. Not very practical. Because of that the compiler won't try to construct such a string unless you explicitly ask it to.

CodePudding user response：

Yes, string literals are allowed to overlap in general. From lex.string#9

... Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.

So it's up to the compiler to make a decision as to whether any string literals overlap in memory. You can write a program to check whether the string literals overlap, but since it's unspecified whether this happens, you may get different results every time you run the program.

CodePudding user response：

The compiler knows the size of each string, because it can "see" it in your code.

Additionally, they are not allocated the same way, that you would allocate them at run-time. Instead, if the strings are constant and defined globally, they are most likely located in the .text section of the object file, not on the heap.

And since the compiler knows the size of a constant string at compile-time, it can simply put its value in the free space of the .text section. The specifics depend on the compiler you use, but be assured the people who wrote are smart enough to avoid this issue.

If you define these strings inside some function instead, the compiler can choose between the first option and allocating space on the stack.

As for the const char[], most compilers will treat it the same way as const char*.

CodePudding user response：

Two string literals will not likely overlap unless they are the same. In that case though the pointers will be pointing to the same thing. (This isn't guaranteed by the standard though, but I believe any modern compiler should make this happen.)

const char *a = "Hello there."
const char *b = "Hello there."

cout << (a == b);
// prints "1" which means they point to the same thing

The const char * can share a string though.

const char *a = "Hello there.";
const char *b = a   6;

cout << a;
// prints "Hello there."
cout << b;
// prints "there."

I think to answer your second question an explanation of c-style strings is useful.

A const char * is just a pointer to a string of characters. The const means that the characters themselves are immutable. (They are stored as part of the executable itself and you wouldn't want your program to change itself like this. You can use the strings command on unix to see all the strings in an executable easily i.e. strings a.out. You will see many more strings than what you coded as many exist as part of the standard library other required things for an executable.)

So how does it know to just print the string and then stop at the end? Well a c-style string is required to end with a null byte (\0). The complier implicitly puts it there when you declare a string. So "string content 1" is actually "string content 1\0".

const char *a = "Hello\0 there.";

cout << a;
// prints "Hello"

For the most part const char *a and const char a[] are the same.

// These are valid and equivalent
const char *a = "Hello";
const char b[] = "there."


// This is valid
const char *c = b   3; // *c = "re."
// This, however, is not valid
const char d[] = b   3;