I'm trying to better understand how different methods of declaring strings in c translate to different types of memory allocation.
Bus errors occur when attempting to modify some strings, but not others. Whether or not an error occurs clearly depends on the way the string being modified is declared, but I don't feel I have a robust understanding of why. More specifically, I don't understand where actual characters are stored when a string literal is assigned to a char*. Here's a simple test example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
int stack_var = 5;
char* str1 = "This is a string.";
char str2[17] = "This is a string.";
char* heap_str = malloc(18);
strcpy(heap_str, "This is a string.");
printf("\n addresses: %p, %p, %p, %p, %p", &stack_var, str1, &str1, str2, heap_str);
int option = atoi(argv[1]);
if (option == 1) {
str1[5] = '\0'; // Bus error
} else if (option == 2) {
str2[5] = '\0'; // No problem
} else if (option==3) {
heap_str[5] = '\0'; // No problem
}
}
On an arbitrary run on my computer, here's the address printout:
addresses: 0x7ff7be6f585c, 0x10180df66, 0x7ff7be6f5850, 0x7ff7be6f5870, 0x600002129120%
Although str1, str2, and heap_str all resolve to the same string, the addresses make several differences clear:
- str2 points to a sequence of chars stored in the stack.
- The address of str1 is a location in the stack; str1 is stored in the stack as a pointer, only.
- The character sequence pointed to by str1 is not in the stack.
- The character sequence pointed to by str1 doesn't look like it's in the heap, either.
So my question boils down to this: where are the characters pointed to by str1 stored?
I'm guessing (hoping) that the answer clarifies why that sequence of characters cannot be modified, while the stack and heap versions of the same can be.
I'm new to c, so apologies for any conspicuous naivety. Thanks in advance!
CodePudding user response:
In this declaration
char* str1 = "This is a string.";
you declared a pointer to a string literal. Though in C string literals have types of non-constant character arrays nevertheless you may not change a string literal like you are trying to do
str1[5] = '\0';
Any attempt to change a string literal results in undefined behavior,
String literals have static storage duration. Their lifetimes do not depend on in which block scope they are used. Usually compilers store string literals in literal pools.
The pointer str1
itself has automatic storage duration.
In C opposite to C string literals have types of constant character arrays. So in C you have to write
const char* str1 = "This is a string.";
It is a good practice to declare pointers to string literals in C also with the qualifier const
.
In this declaration
char str2[17] = "This is a string.";
you allocated an array with automatic storage duration. The array does not contain a string because it does not have a space to accommodate the terminating zero character '\0'
of the string literal used as an initializer.
In C such a declaration is invalid.
In this declaration
char* heap_str = malloc(18);
you allocated dynamically a memory for a character array. The array will have allocated storage duration.
CodePudding user response:
Newbies to C often fail to recognize or grasp the differences between a "string", a "string literal", an "array of char
", and a "pointer to char
". Those are four distinct things.
- an array of
char
is an object (thus having associated storage) that can contain a specific, positive, number ofchar
values. - a string is a sequence of one or more
char
values, the last of which is a null character ('\0'
). This is a kind of value that an array ofchar
can contain, but it is not itself an array, and the value stored in an array ofchar
is not necessarily of this form. - a string literal is a lexeme in C source code that, when used as an lvalue, represents an array of
char
with static storage duration and the contents specified by the literal, null-terminated. The storage for these is not guaranteed to be disjoint (that is, multiple appearances of the same literal might represent the same storage), and undefined behavior results from attempts to modify the contents of the array. - a pointer to
char
is exactly what the name says. If the pointer is valid, then the character to which it points might be the first in a (null-terminated) string, but it does not have to be.
It gets a bit more muddied, however, because (roughly speaking) whenever an array-valued expression is evaluated the (array) result is automatically converted to a pointer to the first array element. This is why all the string functions accept with char *
arguments, and why you can assign a string literal to a variable of type char *
(or better, const char *
). This does not mean that you should conflate pointers with strings.
Thus, here:
char* str1 = "This is a string.";
str1
is initialized to point to the first char
in the array represented by the string literal. That array has static storage duration, and attempting to modify its contents produces undefined behavior.
So my question boils down to this: where are the characters pointed to by str1 stored?
C does not say. More generally, the C language specification does not divide memory into different types or areas at all. Different C implementations often do that, but that's a separate question, and it has to be addressed per implementation.
It is entirely possible that a particular C implementation stores string literals' values in memory marked read-only (by some system specific means), such that attempts to modify them fail. For this reason, it is usually recommended that you avoid causing pointers of type char *
to point to the values of string literals. Using type const char *
instead, and making sure to preserve const
ness as you work with those and pass them around, can ensure that you avoid attempting to modify string literals.