Syscall mremap() does not work as I expected. Is it bugged, is there any other way of remaping memor-CodePudding

My example uses non-POSIX mremap() call to connect chunks of anonymous memory allocated with mmap() into a one continues region. Based on the available documentation I expected this to work properly. However mremap() operation 'grow' unexpectedly fails, most likely due to in-kernel representation (known as VM/VMA) of memory chunk, where one continuous userspace region is still two separate VM/VMA inside a kernel.

Algorithm is as follows:

Having memory A and B, grow A to (A B) size (move of A may happens)
Move B to a newly added space at the end of A
Having also C memory, grow A to (A B C) size (This step fails: EFAULT 14 Bad address)
Move C to a newly added space at the end of A

Exact code is available on github.

Since 'grow' call works the first time, but doesn't work at the second time, I conclude that problem in second call is that memory is constructed from two separate in-kernel regions so mremap cannot handle such parameter.

Such diagnosis may have some support in the kernel comments, however I am not sure how to interpret mm/mremap.c: mremap_to():

        /* We can't remap across vm area boundaries */
        if (old_len > vma->vm_end - addr)
                return ERR_PTR(-EFAULT);

On the other hand, there is no such requirement in the documentation, and it would be surprising to make the user space call dependent on an internal kernel representation.

We may also read in documentation https://man7.org/linux/man-pages/man2/mremap.2.html:

EFAULT Some address in the range old_address to
       old_address old_size is an invalid virtual memory address
       for this process.  You can also get EFAULT even if there
       exist mappings that cover the whole address space
       requested, but those mappings are of different types.

Use of plural 'mappings' clearly suggests that it should be possible to use an area composed of multiple mappings (provided types are the same).

So can someone help me on this:

Is mremap() bugged or I am using it incorrectly?
Or maybe this is a call that "does what author needed it for" and my expectations are unrealistic?
Is there any other way of remapping arbitrary memory in userspace?

CodePudding user response：

I have reproduced the interesting part of your code:

// This test attaches new mapped memory to 'a' chunk.
// However, it fails at second time for unclear reason.
// First, let's create initial mmapped region 'a'
void* a = mmap(NULL, size, PROT_EXEC | PROT_READ | PROT_WRITE, 
        MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, /*not a file mapping*/-1, 0);
err(a, 1);

// Creation of 'b' and attaching it at the end of 'a'
// 'a' need to be enlarged first, relocation is possible
void* b = mmap(NULL, size, PROT_EXEC | PROT_READ | PROT_WRITE, 
        MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, /*not a file mapping*/-1, 0);
err(b, 2);    
a = mremap(a, size, 2 * size, MREMAP_MAYMOVE);                      // Grow 'a'
err(a, 3);
b = mremap(b, size, size, MREMAP_MAYMOVE | MREMAP_FIXED, a   size); // Attach 'b' at the end
err(b, 4);

At this point you have two 4K mappings, 'a' is immediately followed by 'b'. Putting 'b' where you did overwrote part of 'a' shrinking it to 4K.

// Creation of 'c' and attaching it at the end of 'a'
// 'a' need to be enlarged first, relocation is possible
void* c = mmap(NULL, size, PROT_EXEC | PROT_READ | PROT_WRITE, 
        MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, /*not a file mapping*/-1, 0);
err(c, 5);    
a = mremap(a, 2 * size, 3 * size, MREMAP_MAYMOVE);                   // <- this fails: EFAULT 14 Bad address

This fails because 'a' is only 4k, not 8K. Making this change works:

a = mremap(a, size, 3 * size, MREMAP_MAYMOVE);

CodePudding user response：

When you move the new mapping, this is what happens:

   ┌───┐
0. │ A │                 Existing mapping.
   └───┘
   ┌───┐         ┌───┐
1. │ A │     ... │ B │   Create new mapping.
   └───┘         └───┘
   ┌───────┐     ┌───┐
2. │ A   _ │ ... │ B │   Extend original mapping address range.
   └───────┘     └───┘
   ┌───┬───┐
3. │ A │ B │             Move new mapping over extended address range.
   └───┴───┘
   ┌───┬───┐     ┌───┐
4. │ A │ B │ ... │ C │   Create new mapping.
   └───┴───┘     └───┘
   ┌───┬───┐     ┌───┐
5. │ A │ B │ ... │ C │   Cannot extend mapping A, because B is in the way.
   └───┴───┘     └───┘

I.e., even though you do have the correct mappings in continuous addresses, by moving B on top of A you do not "merge" the mappings, you only move them. They stay as separate mappings.

If you read the kernel pseudofile /proc/self/maps, you'll see that your mapping region is indeed split into multiple mappings. I am not aware of any way for userspace to ask the kernel to coalesce different mappings to a single one, aside from just doing a mmap(...MAP_FIXED...) over the entire region, but that will clear the existing contents to zero.

You can try growing only the final mapping in the region (B in step 5. above), but only with zero flags (without MREMAP_MAYMOVE or MREMAP_FIXED). If that fails, then you need to do a mmap(NULL, total_grown_length, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, -1, 0) to obtain a completely new virtual address range you can mremap() each and every one of the mappings you do have, one by one. (Note that that mmap() call does not reserve RAM, only address space.) If that also fails, well, there just isn't enough virtual address space left anymore.

If the new mappings are anonymous memory, then the obvious solution is to just use memcpy() to copy the data from the separate mapping, and then unmap that new mapping. Or better yet, just extend the original mapping in the first place, and use the extended area.