I wrote my own malloc new and realloc for my C project. Some of these pages are >= 4K. I was wondering when I call my malloc is there a way I can zero out the 4K page without reading the data into cache? I vaguely remember reading about something like this in either intel or AMD x86-64 documentation but I can't remember what it's called.
Does gcc (or clang) have an intrinsic I can use? If not what assembly instructions should I look up? I have 3 common use cases after a malloc. zeroing the memory, memcpy-ing a buffer and mixing both (64bytes or 512 of memcpy then rest as zeros). I'm not sure what will be the miminum architecture I'll support but it's no less then haswell. Likely it'll be Intel Skylake/AMD Zen and up
-Edit- I rolled back the C tag to C because generally intrinsic is in C
CodePudding user response:
Under Unix systems you can mmap
/dev/zero to get zero filled pages. That would give you zeroed pages for sure. Depending on the kernel MAP_ANNONYMOUS might also give you zero filled pages. Both ways should not poison the caches.
You can also use MAP_POPULATE (Linux) to allocate physical pages from the start instead of faulting them in on first access. Hopefully this wouldn't poison the caches either but I never verified that in the Linux source.
But I have to wonder: Why would you zero out the pages on malloc
/realloc
/new
? Only calloc
zeroes out pages and for everything else the compiler or source code will zero out the memory. Unless you change the compiler to know about you already zeroing out the pages there won't be any benefit.
Note: For many types in C the memory will not zeroed out at all but initialized using the constructors.
CodePudding user response:
I think rep stosb meets your needs. Even though it does 1-byte writes, it uses write combining internally, so it will fill a full cache line before issuing a write. Then since an entire cache line is being written, it doesn't need to read the dead contents of memory before writing the line to L1.