Home > Blockchain >  fprintf on (unterminated?) C-string doesn't spew garbage or segfault -- undefined behavior?
fprintf on (unterminated?) C-string doesn't spew garbage or segfault -- undefined behavior?

Time:10-27

I recently learned (initially from here) how to use mmap to quickly read a file in C, as in this example code:

// main.c
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>

#define INPUT_FILE "test.txt"

int main(int argc, char* argv) {
  struct stat ss;

  if (stat(INPUT_FILE, &ss)) {
    fprintf(stderr, "stat err: %d (%s)\n", errno, strerror(errno));
    return -1;
  }

  {
    int fd = open(INPUT_FILE, O_RDONLY);
    char* mapped = mmap(NULL, ss.st_size, PROT_READ, MAP_PRIVATE, fd, 0);

    close(fd);
    fprintf(stdout, "%s\n", mapped);
    munmap(mapped, ss.st_size);
  }

  return 0;
}

My understanding is that this use of mmap returns a pointer to length heap-allocated bytes.
I've tested this on plain text files, that are not explicitly null-terminated, e.g. a file with the 13-byte ascii string "hello, world!":

$ cat ./test.txt
hello, world!$
$ stat ./test.txt
  File: ./test.txt
  Size: 13              Blocks: 8          IO Block: 4096   regular file
Device: 810h/2064d      Inode: 52441       Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/   user)   Gid: ( 1000/   user)
Access: 2022-10-25 20:30:52.563772200 -0700
Modify: 2022-10-25 20:30:45.623772200 -0700
Change: 2022-10-25 20:30:45.623772200 -0700
 Birth: -

When I run my compiled code, it never segfaults or spews garbage -- the classic symptoms of printing an unterminated C-string.

When I run my executable through gdb, mapped[13] is always '\0'.

Is this undefined behavior?

I can't see how it's possible that the bytes that are memory-mapped from the input file are reliably NULL-terminated.
For a 13-byte string, the "equivalent" that I would have normally done with malloc and read would be to allocate a 14-byte array, read from file to memory, then explicitly set byte 13 (0-based) to '\0'.

CodePudding user response:

mmap returns a pointer to whole pages allocated by the kernel. It doesn't go through malloc. Pages are usually 4096 bytes each and apparently the kernel fills the extra bytes with zeroes, not with garbage.

  • Related