Home > database >  Why am I able to change a string constant in C?
Why am I able to change a string constant in C?

Time:10-25

In C when you define a string using a char pointer (technically you are defining a pointer variable), it creates an array of the characters in a read-only segment of memory, then returns a pointer to it. That means you should not be able to modify it since it is a constant. If you wanted to modify it you should use a char array instead or use malloc.

But for some reason I am actually able to change the string, how is this even possible?

#include <stdio.h>

int main() {
    // The string "Hello" is a read-only literal (formally its type is const char[5])
    char * msg = "Hello";

    msg = "New string"; // why / how does this work?

    printf("%s \n", msg); // New string
}

But at the same time this does not work (which makes sense)

#include <stdio.h>

int main() {
    char * msg = "Hello";

    *(msg   1) = 'a'; // Error: segmentation fault
    *(msg   1) = "a"; // Error: segmentation fault

    printf("%s \n", msg);
}

CodePudding user response:

"Hello" and "New string" are string literals stored somewhere in the memory.

char * msg = "Hello";

Assigns pointer msg with the reference (address) of the first character of the string literal.

msg = "New string";

Replaces the msg pointer previously assigned value. It does not modify the previously assigned string literal - only modifies the pointer msg.

Second example

 *(msg   1) = 'a'; 

Modifies the string literal - and in C language it is an Undefined Behaviour.

CodePudding user response:

Your "EDIT" explanation is false. Or not exact.

char * msg = "Hello";

As you said, keep a place in the stack for a pointer (in practice, 4 or 8 bytes, depending on the architecture), and also, 6 bytes in an area of the memory for constants (one for each of the 5 letters h, e, l, l, o, and another for the terminal '\0').

So msg is a variable (a classical, read/write variable) whose value can be changed. And "hello" is a constant pointer. And, as you said, at first, value of msg is the constant pointer "hello", that points to a constant area containing letters h,e,l,l,o,'\0'.

msg = "New string";

Does not allocate anything at all. I mean, not at run time. At compilation time, the simple fact that "New string" is mentionned somewhere in the code, created another constant (that exist as soon as the program is ran, not when it encounter this line of code) containing letters, N,e,w,' ',s,t,r,i,n,g,'\0'.

"New string" is just a constant. As 12 is. A constant pointer. Pointing to a constant area of the memory (so constant pointer to constant chars).

So all you did in line

msg = "New string";

is just changing the value of a variable to another constant. It would be as x=12.

Note that literals "foo" are technically constant pointers. So exactly like

char hello[6]="hello";

is

(This is the difference between arrays and pointers: here, hello is not a variable. It is a constant. Exactly as 12 is. Or as "hello" is. A constant whose value is computed by the compiler, that's its job.)

Note also that I've said that "hello" is a constant pointer to constant chars. That is not exact neither. Char pointed by "hello" are not constant from the language point of view. They are l-values, as the chars of my array hello.

If you try to compile

char hello[6]="hello";
hello="foo";
"hello"="bar";
12=13;

It would not compile at all. Even with a warning, it couldn't. Those 3 lines are exactly the same non-sense from the compiler point of view. You can't use a constant as a L-value. The compiler could not know at all where you want to store the right value.

On the other hand, if you try this code

#include <stdio.h>

int main(){
    printf("Hello\n");
    "Hello\n"[1]='a';
    printf("Hello\n");
}

You don't get a compilation error. Sure, you get a fair warning from the compiler, meaning "ok, I compile it, because it is legal from my point of view. But it will crash if you try to run it". But here, exactly as hello array is a constant, but hello[1] can still be modified (that's the whole point of arrays), "hello" array is also a constant, but "hello"[1] is not, and could be modified, from the language point of view.

Now, since "hello" points to a segment of the memory that is not supposed to be modified at run time, you get a segmentation fault.

That is almost recent tho. When I was teaching C (some 10-15 years ago), this code was running perfectly (it prints "Hello" the first time, then "Hallo" the second. You can see why it is a terrible idea from code lisibility: constant "hello" is a pointer to an area containing the letters hallo). I was using this kind of examples (and many things that we should never do) to illustrate what really are strings literals. 10-15 years ago may seem to be a long time ago. But it is not as if it was before the invention of segments, and segmentation fault. Memory was already segmented back then. Just, not for that.

So, obviously, I am not trying at all to teach you tricks here. But just to understand what string literals are. Just the name of a constant pointer to a char. Like arrays are.

CodePudding user response:

First, while string literals are supposed to be immutable, they don't have to be stored in a read-only segment. There have been (and still are) implementations that store string literals in writable segments.

The behavior on trying to modify the contents of a string literal is undefined - it may work as expected, it may have no effect at all, it may lead to a runtime error.

Storage for string literals is usually allocated when the program is initially loaded (they're usually part of the program image) and are visible over the entire program. IOW, when you write

msg = "New string";

the storage for "New string" isn't allocated at that point during runtime - it was allocated when the program first started. You're just setting msg to point to that already-allocated memory.

For pointers that point to literals, it's usually a good practice to declare them const:

const char *msg = "Hello";

You can still change msg to point to a different string literal:

msg = "New string";

but if you try to modify the contents of the string itself you'll get a diagnostic during compilation:

*msg = 'n'; // *msg is const, compiler will yak

CodePudding user response:

In the first case, you are simply changing the value that the pointer is pointing.

On the second, you are trying to acces a restricted memmory area, so you get the segmentation fault.

That's how pointers work.

  • Related