Home > database >  How does realloc treat null bytes in strings?
How does realloc treat null bytes in strings?

Time:01-06

Relatively new C programmer here. I am reviewing the following code for a tutorial for a side project I am working on to practice C. The point of the abuf struct is to create a string that can be appended to. Here is the code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct abuf {
    char* str;
    unsigned int size;
} abuf;


void abAppend(abuf *ab, const char *s, int len) {
  char *new = realloc(ab->str, ab->size   len);
  if (new == NULL) return;
  memcpy(&new[ab->size], s, len);
  ab->str = new;
  ab->size  = len;
}

int main(void) {
    abuf ab = {
        NULL,
        0
    };

    char *s = "Hello";

    abAppend(&ab, s, 5);
    abAppend(&ab, ", world", 7);

    return 0;
}

Everything compiles and my tests (redacted for simplicity) show that the string "Hello" is stored in ab's str pointer, and then "Hello, world" after the second call to abAppend. However, something about this code confuses me. On the initial call to abAppend, the str pointer is null, so realloc, according to its man page, should behave like malloc and allocate 5 bytes of space to store the string. But the string "Hello" also contains the terminating null byte, \0. This should be the sixth and final byte of the string, if I understand this correctly. Isn't this null byte lost if we store "Hello\0" in a malloc-ed container large enough only to store "Hello"?

On the second call to abAppend, we concatenate ", world" to str. The realloc will enlarge str to 12 bytes, but the 13th byte, \0, is not accounted for. And yet, everything works, and if I test for the null byte with a loop like for (int i = 0; ab.str[i] != '\0'; i ), the loop works fine and increments i 12 times (0 thru 11), and stops, meaning it encountered the null byte on the 13th iteration. What I don't get is why does it encounter the null byte, if we don't allocate space for it?

I tried to break this code by doing weird combinations of strings, to no avail. I also tried to allocate an extra byte in each call to abAppend and changed the function a little to account for the extra space, and it performed the exact same as this version. How the null byte gets processed is eluding me.

CodePudding user response:

How does realloc treat null bytes in strings?

The behavior of realloc is not affected by the contents of the memory it manages.

But the string "Hello" also contains the terminating null byte, \0. This should be the sixth and final byte of the string,…

The characters are copied with memcpy(&new[ab->size], s, len);, where len is 5. memcpy copies characters without regard to whether there is a terminating null byte. Given length of 5, it copies 5 bytes. It does not append a terminating null character to those.

The realloc will enlarge str to 12 bytes, but the 13th byte, \0, is not accounted for.

On the second called to abAppend, 7 more bytes are copied with memcpy, after the first 5 bytes. memcpy is given a length of 7 and copies only 7 bytes.

… it encountered the null byte on the 13th iteration.

When you tested ab.str[12], you exceeded the rules for which the C standard defines the behavior. ab.str[12] is outside the allocated memory. It is possible it contained a null byte solely because nothing else in your process had used that memory for another purpose, and that is why your loop stopped. If you attempted this in the middle of a larger program that had done previous work, that byte might have contained a different value, and your test might have gone awry in a variety of ways.

CodePudding user response:

You're correct that you only initially allocated space for the characters in the string "Hello" but not the terminating null byte, and that the second call only added enough bytes for the characters in tge string ", world" with no null terminating byte.

So what you have is an array of characters but not a string since it's not null terminated. If you then attempt to read past the allocated bytes, you trigger undefined behavior, and one of the ways UB can manifest itself is that things appear to work properly.

So you got "lucky" that things happened to work as if you allocated space for the null byte and set it.

  • Related