Home > Back-end >  Is the caller or callee responsible for freeing shadow store in x64 assembly (windows)?
Is the caller or callee responsible for freeing shadow store in x64 assembly (windows)?

Time:08-01

Coming from C and C , I have recently started to learn x86-64 assembly to understand better the workings of my programs.

I know that the convention in x64 assembly is to reserve 32 bytes of 'shadow store' on the stack before calling a function (by doing: subq $0x20, %rsp).

What I am unsure about is: is the callee responsible for incrementing %rsp again, or the caller?

In other words (using printf as an example), would number 1 or number 2 be correct (or perhaps neither :P)?

1.

subq $0x20, %rsp
movabsq $msg, %rcx
callq printf
subq $0x20, %rsp
movabsq $msg, %rcx
callq printf
addq $0x20, %rsp

(... where msg is an ascii string stored in the .data section that I am passing to printf)

I am on Windows 10, using GAS as my assembler.

Any help would be much appreciated, cheers.

CodePudding user response:

Deallocating shadow space is the caller's responsibility.

But normally you'd do it once per function, not once per call-site within a function. Usually you just move RSP once (maybe after some pushes) and leave it alone until you're ready to return. That includes making room to store stack args if any for functions with more than 4 args.

In the Windows x64 calling convention (and x86-64 System V), the callee must return without changing the caller's RSP. i.e. with ret, not ret 32, and without having copied the return address somewhere else.

MS has some examples in https://docs.microsoft.com/en-us/cpp/build/prolog-and-epilog?view=msvc-170#epilog-code
And specifically documents that RSP mustn't be changed by functions:

The x64 ABI considers registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, and XMM6-XMM15 nonvolatile. They must be saved and restored by a function that uses them.

(You also need to emit unwind metadata for every instruction that moves the stack pointer, and about where you saved non-volatile aka call-preserved registers, if you want to be fully compliant with the ABI, including for SEH and C exception unwinding. Toy programs still work fine without, as long as you don't expect C exceptions to work, or debuggers to unwind the stack back to the stack frame of a caller.)


You can see this if you look at MSVC compiler output, e.g. https://godbolt.org/z/xh38jxWqT , or for AT&T syntax, gcc -O2 -mabi=ms to tell it that all the functions it sees are __attribute__((ms_abi)) by default, but it doesn't override the fact that it's targeting Linux. So with -fPIE to make it use LEA instead of 32-bit absolute addressing for symbol addresses, we also get call printf@plt, not Windows style calls to DLL functions.

But the stack management from GCC matches what MSVC -O2 also does.

#include <stdio.h>

void bar();
int foo(){
    printf("%d\n", 1);
    bar();
    return 1;  // make sure this isn't a tailcall
}
# gcc -O2 -mabi=ms  (but still sort of targeting Linux as far as dynamic linking)
.LC0:
        .string "%d\n"      ## in .rodata

foo():
        subq    $40, %rsp
        movl    $1,            
  • Related