Consequence of violating macOS's ARM64 calling convention-CodePudding

I'm porting some AArch64/ARM64/Apple Silicon assembly code from Linux to macOS.

This code uses all 31 available registers (stack pointer doesn't count) to avoid almost all cases of spilling; the Linux calling conventions allow me to use that many registers.

If pressed, I would admit that spilling one extra register (thus bringing it down to 30 registers used) is feasible as performance would be minimally affected, but if restricted to 29 or less available registers, performance would suffer considerably more. Thus, I'd really like to have at least 30 registers available, and ideally 31.

I just learned from this official Apple document that two extra registers are reserved, beyond what the Linux calling conventions require:

Respect the Purpose of Specific CPU Registers

The ARM standard delegates certain decisions to platform designers. Apple platforms adhere to the following choices:

The platforms reserve register x18. Don’t use this register.

The frame pointer register (x29) must always address a valid frame record. Some functions — such as leaf functions or tail calls — may opt not to create an entry in this list As a result, stack traces are always meaningful, even without debug information.

Despite these claims, my code appears to run fine without it.

Now, I fully understand that ignoring such ABI requirements is a Very Bad Thing (TM). However, I'd like to understand exactly how the code may break due to the use of each of x18 and x29.

For instance, from reading the above documentation, my understanding is that x29 is there to support debugging or crash dumps. Suppose I didn't care about debugging this function in particular (which I actually don't), or whether any generated crash dumps are meaningful. In that case, is there any harm to using x29?

As for x18, any idea what is it used for? I'd hypothesize (with zero supporting evidence) that if an interrupt or context switch executes while this code is running, x18 is not saved, and thus corrupts the results of my function once it returns. That would be a more serious condition, and I'd heed the advice to not use x18 in that case.

Also note that the code in question is a leaf function, so there is no issue with breaking any functions called from within it.

CodePudding user response：

I think it's fine to use x29 for anything you like if you don't want working backtraces.

When Apple started out with aarch64 on iOS, they didn't yet have any specific use in mind for x18 quite yet, but they wanted to keep it reserved, to make sure people don't accidentally end up relying on it. So back then, the kernel clobbered x18, setting it to some nonzero bogus value on every context switch, just so that it would be very clear to everybody that you can't use it.

Since the arrival of macOS on Apple Silicon, they did remove the piece of code that intentionally clobbers this register though. I'm not sure if this is because they have actually found a specific use for the register somewhere (so they can't have the kernel clobber it), or have just chosen to play nicer regarding it. (It does help for e.g. userspace emulating Windows executables with Wine though - in Windows, x18 is supposed to always be a pointer pointing towards the TEB, thread environment block.)

So while x18 does seem to empirically be usable on macOS today, it's not sure if it works on iOS, and it can be used for something, so I would recommend against using that.

CodePudding user response：

You can safely clobber x29 if broken backtraces are an acceptable loss to you.

x18 is a different story.
On macOS, Rosetta uses it, so Apple can't clobber it anymore without at least refactoring that. They also have a kernel test to make sure x18 is restored "on hardware that supports it". And so far, that is all hardware that support arm64 macOS, and all macOS versions that support Apple Silicon have this behaviour enabled.
On iOS though, there is hardware that does not support it, specifically the A11 chip series and older. On those chips, the kernel is configured with __ARM_KERNEL_PROTECT__, which enables a Spectre mitigation that uses x18 on all exception handlers, even async ones, before the kernel gets a chance to spill any registers. So unless you're running with interrupts off, your x18 can be zeroed at any point in time. In addition, even on A12 and later, iOS versions before iOS 14.0 did clobber x18 intentionally.

Now if you checked out the linked test, you might be tempted to check the sysctl hw.optional.arm_kernel_protect at runtime, but unfortunately that is only exported on DEVELOPMENT and DEBUG configurations of XNU.

So if you're targeting iOS, you cannot use x18. If you're targeting macOS, then you can use it for the time being, but that may change in the future. You could try and detect such change by doing the same thing the test does: set x18 to a certain value, call sched_yield(), then check the value. But again, that relies on all exceptions treating x18 the same, and while they currently do, that too may change in the future.