On ARM after writing instructions to memory a memory barrier is needed before executing the instructions. Specifically clean the data cache, invalidate the instruction cache, then execute an instruction sync barrier (ISB
) on the CPU that will execute the code.
One can use cp
to copy an executable or shared library then execute without an explicit memory barrier. This amounts to:
- Open the file.
- Write to the file with
write
. - Close the file.
- Open the file.
- Map the file with
mmap
withPROT_READ | PROT_EXEC
. - Execute the code.
Likewise one can presumably use mmap
to write to the file:
- Open file.
- Map the file with
mmap
withPROT_READ | PROT_WRITE
anMAP_SHARED
. - Write to the with normal memory writes.
- Unmap the file with
munmap
. - Close the file.
- Open the file.
- Map the file with
mmap
withPROT_READ | PROT_EXEC
- Execute the code.
So where in above steps is the necessary cache manipulation hiding? Is it in munmap
or in mmap
? Assume there is no disk access.
Presumably if neither munmap
or mmap
are called between writing and execution explicit cache synchronisation is needed with a call to __clear_cache
, but can this done with either mapping?
CodePudding user response:
It's done by the mmap
system call. When you map pages executable, the kernel has to ensure that when the system call returns, those pages are ready to execute.
To do its work, mmap
will have to update the page table. In the function __set_pte_at
, which I presume from its name is called whenever a page table entry is updated, we have a call to __sync_icache_dcache
, which, if you trace it down, should eventually execute the relevant ic
and dc
instructions.
That leaves the isb
. Its purpose is to be a "context synchronization event" (see the Architecture Reference Manual under "Synchronization and coherency issues between data and instruction accesses"), which is what is needed to flush any prefetched instructions. But the eret
instruction that returns from the kernel to user space already counts as such an event, so that's automatically taken care of.
The same thing would happen if you replace your steps 4-7 of your second sequence with a call to mprotect
, to simply set execute permission on the memory that's already there.
In the case of munmap
, a cache flush shouldn't actually be needed. It is going to unmap those pages and flush the TLBs, and it gets a free context synchronization event from eret
as before. That means that any future attempt to execute at that address will take a page fault, and so it doesn't matter what bits are actually in the icache line.
As far as I know, the only time you would need to do the cache flushing and isb
in user space is if you want to write the code into some block of memory and then immediately branch there, like a JIT compiler might do, without a mmap/mprotect call in between. That could be done with a PROT_WRITE | PROT_EXEC
mapping (though I'm not sure if all kernels allow that these days); or by "aliasing" - mapping the same physical memory at two different virtual addresses, with different permissions. See AArch64 memory synchronization operations on multiply-mapped addresses and Synchronizing caches for JIT/self-modifying code on ARM.