This is a repost of https://codereview.stackexchange.com/questions/278940/hello-world-in-x64-assembly-for-windows-shadow-space-stack-alignment/, it was suggested to me that stackoverflow might have more fitting answers/comments.
I`m currently trying to delve into x64 assembly under windows using NASM, and created a minimalistic "Hello World" application.
It is mainly meant as an educational resource for me and possibly others, hence the heavy documentation style.
A full repo with build instructions and code is located at hello_kernel32, this is the relevant source file:
;; Resources:
;; https://sonictk.github.io/asm_tutorial/
;; https://gist.github.com/mcandre/b3664ffbeb4f5764b36a397fafb04f1c
;; https://retroscience.net/x64-assembly.html
;; Make clear this file contains 64bit assembly
bits 64
;; Use rip-relative addressing
default rel
;; Export entry symbol (this is specified in the call to link.exe)
global _start
;; Import external symbols
;; (all of them exist in kernel32.lib, which gets passed to link.exe in addition to the programs object file hello.obj)
;;
;; Why import symbols/functions from kernel32.lib?
;;
;; In windows, the "low level API stack" is: Kernel < Syscalls < ntdll.dll < kernel32.dll (and others like user32.dll)
;;
;; * The kernel itself cannot be accessed by user programs for obvious reasons (CPU ring protection modes)
;; * Syscalls could be performed, but are undocumented and evidently unstable between different versions of windows
;; * ntdll.dll is only partially documented and not intended for external use
;; * kernel32.dll (and friends) are the "official" low-level entry points to the windows API
extern GetStdHandle
extern WriteFile
extern ExitProcess
;; This section contains read-only data
section .rodata
;; Store the output string followed by CRLF as a sequence of bytes, at address 'msg'
msg db "Hello World!", 0x0d, 0x0a
;; The length will be needed by the output function, and can be statically calculated at assembly time with 'equ'
;; It is actually a nifty trick that calculates the offset between the current address '$', and the address of 'msg'
;; See https://nasm.us/doc/nasmdoc3.html#section-3.2.4
msg_len equ $ - msg
;; This section contains the code
section .text
_start:
;; This will discard the return address on the stack which we don't need since we will never call `ret`,
;; but terminate via `call ExitProcess`.
;; It has the positive effect of aligning the stack to 16bytes for upcoming calls, and will provide _our_
;; shadow space to those called functions.
add rsp, 8;
;; For being able to print text, we first need to acquire a HANDLE to STDOUT
;; This HANDLE is a required parameter for the call to WriteFile
;; HANDLE = GetStdHandle(-11)
;;
;; See https://docs.microsoft.com/en-us/windows/console/getstdhandle
;;
;; Parameter 1 (rcx): requests the type of HANDLE, -11 is the constant for STDOUT
;; Return value (rax): HANDLE (an address with some type of meaning) is stored in rax, as per calling conventions
mov rcx, -11
call GetStdHandle
;; code = WriteFile(HANDLE, msg, msg_len, NULL, NULL)
;;
;; See https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile
;;
;; Parameter 1 (rcx): HANDLE to write to
;; Parameter 2 (rdx): Address of message to print
;; Parameter 3 (r8): Length of message
;; Parameter 4 (r9): Write amount of written bytes to this address, null pointer
;; (Required according to docs when parameter 5 is null, but passing null seems to work just fine)
;; Parameter 5 (on stack): Unused optional parameter, null pointer
;; Return value (rax): Nonzero on success
mov rcx, rax
lea rdx, [msg]
mov r8, msg_len
mov r9, 0
mov qword [rsp 32], 0 ;; We already allocated the shadow space in the prolog and can't use push.
call WriteFile
;; ExitProcess(code)
;;
;; See https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-exitprocess
;;
;; Parameter 1 (rcx): Exit code
mov rcx, rax
call ExitProcess
;; ExitProcess will internally issue the syscall for terminating the process after doing some cleanup
;; We messed with rsp in the prolog which would make a `ret` impossible anyway
My main question is about:
- Shadow space stack alignment
Most resources I found seem to completely neglect this when doing simple "hello world" programs.
My program also seems to run just fine when removing all the sub rsp
/add rsp
statements.
So I'm wondering what the implications of not following those conventions really means for code correctness.
My current understanding is that GetStdHandle
/WriteFile
/ExitProcess
are simply not using shadow space and also do not perform any operation that requires an aligned stack in their current implementation as present on my machine.
However, any update to kernel32.dll
is free to change those implementations in a way that relies on shadow space being present and/or an aligned stack.
Therefore code neglecting shadow space/stack alignment for external calls is incorrect in the general sense of incorrectly interfacing with an external API, even though many APIs may be built in a way they can tolerate that slightly incorrect access (but this tolerance is an implementation detail that may change at any time).
-> Is this definition/understanding correct? / Anything to add/clarify?
- General question
Are there any other apparent mistakes / misunderstandings in the code or comments?
CodePudding user response:
No, those functions probably are writing into their shadow space, your program just doesn't depend on [rsp 0..31] being unmodified across a call
You call ExitProcess
instead of ret
(because _start might not even have a return address on the stack? IDK if Windows does).
And you don't keep any local vars in your own stack space that becomes shadow space for callees, so they're not stepping on your local vars.
But yes they are tolerating misalignment, if your _start
gets called as a function, i.e. with RSP == 8
. Unlike in Linux where RSP == 0
at the _start
entry point where RSP points at argc
, argv[]
, envp[]
, not a return address.
Alignment is usually only a correctness problem in code that uses movaps
/ movdqa
to copy 16 bytes at a time, like glibc scanf/printf being common examples on Linux (glibc scanf Segmentation faults when called from a function that doesn't align RSP). On Windows perhaps their DLL functions don't use SSE to copy 16-byte chunks, or they use movups
/ movdqu
(MSVC usually doesn't emit alignment-required mov
instructions, even for alignment-required intrinsics like _mm_store_si128
instead of _mm_storeu_si128
, making code that's slower on ancient CPUs like K10 or Core2).
Normally you wouldn't sub/add RSP around every function. You'd sub rsp, 40
at the top of a function to reserve shadow space alignment. And add rsp, 40
once at the end, or not if you end by calling a noreturn function instead of using ret
. It's normal to not move RSP at all except in the function prologue/epilogue, especially in a calling convention with shadow space.
To store stack args, you'd replace push 0
with mov qword [rsp 32], 0
. Or you'd actually start your function with push 0
/ sub rsp, 32
so that 0
above the shadow space is there the whole time, and can be an arg for the first function call to take a stack arg. Functions without stack args are required not to touch it, since it's not an arg for them, just part of their caller's stack frame.