I have been working with strings in C. While working with ways to declare them and initialize them, I found some weird behavior I don't understand.
#include<stdio.h>
#include<string.h>
int main()
{
char str[5] = "World";
char str1[] = "hello";
char str2[] = {'N','a','m','a','s','t','e'};
char* str3 = "Hi";
printf("%s %zu\n"
"%s %zu\n"
"%s %zu\n"
"%s %zu\n",
str, strlen(str),
str1, strlen(str1),
str2, strlen(str2),
str3, strlen(str3));
return 0;
}
Sample output:
Worldhello 10
hello 5
Namaste 7
Hi 2
In some cases, the above code makes str
contain Worldhello
, and the rest are as they were intialized. In some other cases, the above code makes str2
contain Namastehello
. It happens with different variables I never concatenated. So, how are they are getting combined?
CodePudding user response:
To work with strings, you must allow space for a null character at the end of each string. Where you have char str[5]="World";
, you allow only five characters, and the compiler fills them with “World”, but there is no space for a null character after them. Although the string literal "World"
includes an automatic null character at its end, you did not provide space for it in the array, so it is not copied.
Where you have char str1[]="hello";
, the compiler determines the array size by counting the characters, including the null character at the end of the string literal.
Where you have char str2[]={'N','a','m','a','s','t','e'};
, there is no string literal, just a list of individual characters. The compiler determines the array size by counting those. Since there is no null character, it does not provide space for it.
One potential consequence of failing to terminate a string with a null character is that printf
will continue reading memory beyond the string and printing characters from the values it finds. When the compiler has placed other character arrays after such an array you are printing, characters from those arrays may appear in the output.
If you allow space for a null character in str
and provide a zero value in str2
, your program will print strings in an orderly way:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[6] = "World"; // 5 letters plus a null character.
char str1[] = "hello";
char str2[] = {'N', 'a', 'm', 'a', 's', 't', 'e', 0}; // Include a null.
char *str3 = "Hi";
printf("%s %zu\n%s %zu\n%s %zu\n%s %zu\n",
str, strlen(str),
str1, strlen(str1),
str2, strlen(str2),
str3, strlen(str3));
return 0;
}
CodePudding user response:
Undefined behavior in non-null-terminated, adjacently-stored C-strings
Why do you get this part:
Worldhello 10
hello 5
...instead of this?
World 5
hello 5
The answer is that printf()
prints chars until it hits a null character, which is a binary zero, frequently written as the '\0'
char. And, the compiler happens to have placed the character array containing hello
right after the character array containing World
. Since you explicitly forced the size of str
to be 5
via str[5]
, the compiler was unable to fit the automatic null character at the end of the string. So, with hello
happening to be (not guaranteed to be) right after World
, and printf()
printing until it sees a binary zero, it printed World
, saw no terminating null char, and continued right on into the hello
string right after it. This resulted in it printing Worldhello
, and then stopping only when it saw the terminating character after hello
, which string is properly terminated.
This code relies on undefined behavior, which is a bug. It cannot be relied upon. But, that is the explanation for this case.
Run it with gcc on a 64-bit Linux machine online here: Online GDB: undefined behavior in NON null-terminated C strings
@Eric Postpischil has a great answer and provides more insight here.
CodePudding user response:
From the C tag wiki:
This tag should be used with general questions concerning the C language, as defined in the ISO 9899 standard (the latest version, 9899:2018, unless otherwise specified — also tag version-specific requests with c89, c99, c11, etc).
You've asked a "how?" question about something that none of those documents defines, and so the answer is undefined in the context of C. You can only experience this phenomenon through undefined behaviour.
how are they are getting combined?
There is no such requirement that any of these variables are "combined" or are immediately located after each other; trying to observe that is undefined behaviour. It may appear to coincidentally work (whatever that means) for you at times on your machine, while failing at other times or using some other machine or compiler, etc. That's purely coincidental and not to be relied upon.
In some cases, the above code assigns str with Worldhello and the rest as they were intitated.
In the context of undefined behaviour, it makes no sense to make claims about how your code functions, as you've already noticed, the functionality is erratic.
I found some weird Behaviour with them.
If you want to prevent erratic behaviour, stop invoking undefined behaviour by accessing arrays out of bounds (i.e. causing strlen
to run off the end of an array).
Only one of those variables is safe to pass to strlen
; you need to ensure the array contains a null terminator.