Home > Software design >  Is the \0 character an actual 0x00 byte? Why doesn't this break int arrays with a "0"
Is the \0 character an actual 0x00 byte? Why doesn't this break int arrays with a "0"

Time:11-28

From what I understand, array ends in C are marked in memory by the "\0" character.

But what is it, exactly? If I had a String "ABC", would the memory region for its char array look like this:

0x41 0x42 0x43 0x00?

Because if so, wouldn't that imply that an int[] array couldn't contain a 0, because that would mark its premature end? I.e., [1, 2, 0, 3, 4], stored as bytes 0x01 0x02 0x00 0x03 0x04, then upon encountering the 0x00, the program would say "oh look it's a 2 byte long array, we're done here"?

CodePudding user response:

From what I understand, array ends in C are marked in memory by the "\0" character.

No in C language arrays do not have any end markers.

Only C strings are char (or wchar_t for multibyte characters sets) arrays and the end of the string is indicated by the null character

From the C standard:

A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.

So the null character will have value 0 (or 0x00 in hex). The character constant is '\0' (or L'\0' for wchar_t strings)

Because if so, wouldn't that imply that an int[] array couldn't contain a 0

No, because the null character only terminates the C strings

Why is it only necessary for those?

Because the char array containing the string, can be much larger than the string itself and you need to know where the string ends.

But of course, your algorithm may introduce a sentinel value for other types, which will indicate where the actual data ends.

CodePudding user response:

From what I understand, array ends in C are marked in memory by the "\0" character.

You are mistaken.

The end of an array can be marked by the zero character if its last element explicitly or implicitly was set to zero.

For example if you will declare an integer array like

int a[] = { 1, 2, 3, 4, 5 };

then neither element of the array stores zero. But if you will declare the array like

int a[6] = { 1, 2, 3, 4, 5 };

then indeed the last element of the array will be implicitly initialized with zero.

As for string literals like "ABC" then they are stored as character arrays with appended zero character. That is for example the string literal "ABC" is stored in memory as an unnamed array like

char unnamed_literal[] = { 'A', 'B', 'C', '\0' };

Also then you are initializing a character array with a string literal like

char s[] = "ABC";

then all characters of the string literal including the terminating zero are used as initializers of elements of the initialized array.

That is if you will write after that

printf( "sizeof( s ) = %zu\n", sizeof( s ) );

then this statement outputs the value 4.

However in C you may exclude the terminating zero from initializers when a character array is initialized by a string literal. For example

char s[3] = "ABC";

In this case the initialized character array will not contain a string (a sequence of characters terminated by the zero character '\0'), The array s will contain only three characters { 'A', 'B', 'C' }. You can check this as shown above that is like

printf( "sizeof( s ) = %zu\n", sizeof( s ) );

In this case the value 3 will be outputted.

As for your statement

[1, 2, 0, 3, 4], stored as bytes 0x01 0x02 0x00 0x03 0x04, then upon encountering the 0x00, the program would say "oh look it's a 2 byte long array, we're done here"?

then pay attention to that neither program says anything in this case. It is a function appropriately defined can check whether an array contains an element that is equal to zero.

For example such functions present in the C Standard that rely on whether a character array contains a string (a sequence of characters terminated by the zero character '\0' ). For example the standard function strlen returns the number of characters stored in a character array before the terminating zero is encountered. But this value does not denote the end of the array. The character array can be much larger than the length of the stored string in it.

For integer arrays there are no such functions in the C Standard. You could write such a function for integer arrays yourself if for example zero denotes the end of actual elements in your arrays. But in general usually zero is a valid integer value that can be present among other values in elements of an integer array.

CodePudding user response:

It actually is a zero (usually at least, I’m not sure how the standard defines it). It’s the type (char, or rather array of chars which are treated as strings) and convention used that make it ”special”, not the zero itself, i.e.: standard library functions rely on the strings being null-terminated. But one is free to write their own implementations that will behave different. Now, it might not be viable to do this in practice, but it is technically possible.

Also note, that non-terminated char array can used as well. Simply not as a string.

  • Related