I just discovered a very weird trick that the c compiler is doing, it's a very simple code I tried it in many online c compilers but the result is the same, which is driving me insane.
#include <stdio.h>
int main()
{
char Buffer[10] = "0123456789";
char ID[5] = "abcde";
printf("%s",ID);
return 0;
}
Take your time and try predict the result of the printf function, if you're a human like me then I think the most obvious solution is "abcde", which is not correct! But if somehow you figured it out "abcde0123456789" then you're consuming electricity to live.
How, just how is that possible, I'm only selecting the ID array to be printed, then WHY the Buffer one is printed with it too, it doesn't make sense, even the ID array isn't big enough to fit all that data, I'm really losing my mind here.
CodePudding user response:
The format specification %s
expects a pointer a string: sequence of characters terminated by the zero character '\0'
.
However the both arrays
char Buffer[10] = "0123456789";
char ID[5] = "abcde";
do not contain strings. So the behavior of the call of printf invokes undefined behavior.
You should write
char Buffer[] = "0123456789";
char ID[] = "abcde";
or
char Buffer[11] = "0123456789";
char ID[6] = "abcde";
Pay attention to that string literals are stored as character arrays with addition zero character '\0'.
For example this declaration
char ID[] = "abcde";
in fact is equivalent to
char ID[] = { 'a', 'b', 'c', 'd', 'e', '\0' };
and this declaration
char ID[5] = "abcde";
is equivalent to
char ID[5] = { 'a', 'b', 'c', 'd', 'e' };
That is in the last case the zero character '\0'
is not used as an initializer of the array ID.
CodePudding user response:
The ID
char array has no null terminator ('\0'
), there is not enough space for it.
The behavior of the code is undefined because printf
cannot treat ID
as a string, aka a null terminated char array, so it just overruns the buffer and prints whatever is in the contiguous memory, which in your case is the other char array Buffer
.
You need an extra space for the mentioned terminator:
char ID[6] = "abcde"; //will automatically append \0 to the char array
Note that omiting the size is actually the best practice, as the compiler will deduce the needed size:
char ID[] = "abcde";
The majority of the compilers and respective versions I tested seem to behave as you describe and print both arrays, not all of them though.
As you can see here:
https://godbolt.org/z/1E396Y3KG (gcc with optimization)
Or here:
https://godbolt.org/z/roa6GxWvr (msvc)
The result is not always abcde0123456789
.
CodePudding user response:
The printf()
format %s
assumes a NUL terminated string. But you declared ID[5]
with 5 printable characters, and since you specified its size it has no NUL byte at the end.
This caused printf()
to overrun the allocated space of ID
and by just dumb luck that ran into the allocation of Buffer
.
Don't do that.
That invoked the demons of undefined behavior and you got "lucky" that the result was only unexpected output.
Incidentally, Buffer
too is initialized without a terminating NUL byte, so your printed string was terminated by whatever random thing the linker put immediately after Buffer
when it allocated storage for objects in the data segment.
CodePudding user response:
Other answers are flawless and explain everything perfectly, but I'd like to show you a more practical example, since you can have so much fun by playing with C.
Let's define 2 char arrays array1
and array2
both with different length and both without the terminator character '\0'
.
The following code saves the lowest address of the two arrays ((char*)&array1 < (char*)&array2
) and saves it in startingPtr
, then prints the following 100 char
(byte) of memory starting from startingPtr
, showing both the address and the content:
#include <stdio.h>
int main()
{
char array1[10] = "0123456789";
char array2[5] = "abcde";
char* startingPtr;
printf("Memory address array1: %p\nMemory address array2: %p\n", &array1, &array2);
printf("\nContent of array1: %s\nContent of array2: %s\n", array1, array2);
// Get which one has the lower address
if ((char*)&array1 < (char*)&array2)
startingPtr = (char*)&array1;
else startingPtr = (char*)&array2;
// Print memory content, starting from the memory address of the first array
printf("\nMemory address] Memory content:\n");
for (int i = 0; i < 100; i )
{
printf("%p] %c\n", &(*(startingPtr i)), *(startingPtr i));
}
return 0;
}
Check the output there, with different compilers:
- x86-64 gcc 12.2: https://godbolt.org/z/18vjoxbM8
- x64 msvc v19.latest: https://godbolt.org/z/TfedPfzrf
As you can notice, the output can be different for a bunch of reasons (depending on compiler, the machine, virtual memory, etc.).
But the reason you can sometime see the content of both the arrays is that it can happen that the Operating System allocates their variables near, in continuous memory addresses. Therefore printf("%s");
, which expects a properly formatted "string" (i.e. a char buffer with the terminator character at the end), believes that your buffer is longer than 10 or 5 characters, and prints also the following characters.
However, that's definitely not reliable, since it's undefined behaviour.
NB: the notation *(array index)
is one of the many ways you can access an array elements. Since arrays are basically pointers to their first element, that means "get the value stored at memory address array index", and is equivalent to array[index]
.
CodePudding user response:
If you are going to go to the effort of counting the number of characters in strings, a better notation would show the world that you are aware of the required '\0';
#define NUMBER_OF_DIGITS 10
char Buffer[ NUMBER_OF_DIGITS 1 ] = "0123456789";
char ID[ 5 1 ] = "abcde";
Be aware:
sizeof ID != strlen( ID );
CodePudding user response:
In 'c' the string should contain a null terminator. Therefore real length of the string requires at least one additional character. So your string 'abcde' requires 6 characters, 5 1 extra. So, the following will work:
char Buffer[11] = "0123456789";
char ID[6] = "abcde";
The compiler will add '\0'
automatically, since it is the part of the double-quoted strings.