I recently learned (initially from here) how to use mmap
to quickly read a file in C, as in this example code:
// main.c
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#define INPUT_FILE "test.txt"
int main(int argc, char* argv) {
struct stat ss;
if (stat(INPUT_FILE, &ss)) {
fprintf(stderr, "stat err: %d (%s)\n", errno, strerror(errno));
return -1;
}
{
int fd = open(INPUT_FILE, O_RDONLY);
char* mapped = mmap(NULL, ss.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
fprintf(stdout, "%s\n", mapped);
munmap(mapped, ss.st_size);
}
return 0;
}
My understanding is that this use of mmap
returns a pointer to length heap-allocated bytes.
I've tested this on plain text files, that are not explicitly null-terminated, e.g. a file with the 13-byte ascii string "hello, world!":
$ cat ./test.txt
hello, world!$
$ stat ./test.txt
File: ./test.txt
Size: 13 Blocks: 8 IO Block: 4096 regular file
Device: 810h/2064d Inode: 52441 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ user) Gid: ( 1000/ user)
Access: 2022-10-25 20:30:52.563772200 -0700
Modify: 2022-10-25 20:30:45.623772200 -0700
Change: 2022-10-25 20:30:45.623772200 -0700
Birth: -
When I run my compiled code, it never segfaults or spews garbage -- the classic symptoms of printing an unterminated C-string.
When I run my executable through gdb, mapped[13]
is always '\0'
.
Is this undefined behavior?
I can't see how it's possible that the bytes that are memory-mapped from the input file are reliably NULL-terminated.
For a 13-byte string, the "equivalent" that I would have normally done with malloc
and read
would be to allocate a 14-byte array, read from file to memory, then explicitly set byte 13 (0-based) to '\0'
.
CodePudding user response:
mmap
returns a pointer to whole pages allocated by the kernel. It doesn't go through malloc
. Pages are usually 4096 bytes each and apparently the kernel fills the extra bytes with zeroes, not with garbage.