Home > Net >  "Hello world" in x64 assembly for Windows - Shadow space / Stack alignment
"Hello world" in x64 assembly for Windows - Shadow space / Stack alignment

Time:08-22

This is a repost of https://codereview.stackexchange.com/questions/278940/hello-world-in-x64-assembly-for-windows-shadow-space-stack-alignment/, it was suggested to me that stackoverflow might have more fitting answers/comments.

I`m currently trying to delve into x64 assembly under windows using NASM, and created a minimalistic "Hello World" application.

It is mainly meant as an educational resource for me and possibly others, hence the heavy documentation style.

A full repo with build instructions and code is located at hello_kernel32, this is the relevant source file:

;; Resources:
;; https://sonictk.github.io/asm_tutorial/
;; https://gist.github.com/mcandre/b3664ffbeb4f5764b36a397fafb04f1c
;; https://retroscience.net/x64-assembly.html


;; Make clear this file contains 64bit assembly
bits 64

;; Use rip-relative addressing
default rel

;; Export entry symbol (this is specified in the call to link.exe)
global _start

;; Import external symbols
;; (all of them exist in kernel32.lib, which gets passed to link.exe in addition to the programs object file hello.obj)
;;
;; Why import symbols/functions from kernel32.lib?
;;
;; In windows, the "low level API stack" is: Kernel < Syscalls < ntdll.dll < kernel32.dll (and others like user32.dll)
;;
;; * The kernel itself cannot be accessed by user programs for obvious reasons (CPU ring protection modes)
;; * Syscalls could be performed, but are undocumented and evidently unstable between different versions of windows
;; * ntdll.dll is only partially documented and not intended for external use
;; * kernel32.dll (and friends) are the "official" low-level entry points to the windows API
extern GetStdHandle
extern WriteFile
extern ExitProcess


;; This section contains read-only data
section .rodata

    ;; Store the output string followed by CRLF as a sequence of bytes, at address 'msg'
    msg db "Hello World!", 0x0d, 0x0a

    ;; The length will be needed by the output function, and can be statically calculated at assembly time with 'equ'
    ;; It is actually a nifty trick that calculates the offset between the current address '$', and the address of 'msg'
    ;; See https://nasm.us/doc/nasmdoc3.html#section-3.2.4
    msg_len equ $ - msg


;; This section contains the code
section .text

_start:
    ;; This will discard the return address on the stack which we don't need since we will never call `ret`,
    ;; but terminate via `call ExitProcess`.
    ;; It has the positive effect of aligning the stack to 16bytes for upcoming calls, and will provide _our_
    ;; shadow space to those called functions.
    add rsp, 8;

    ;; For being able to print text, we first need to acquire a HANDLE to STDOUT
    ;; This HANDLE is a required parameter for the call to WriteFile

    ;; HANDLE = GetStdHandle(-11)
    ;;
    ;; See https://docs.microsoft.com/en-us/windows/console/getstdhandle
    ;;
    ;; Parameter 1 (rcx): requests the type of HANDLE, -11 is the constant for STDOUT
    ;; Return value (rax): HANDLE (an address with some type of meaning) is stored in rax, as per calling conventions
    mov rcx, -11
    call GetStdHandle

    ;; code = WriteFile(HANDLE, msg, msg_len, NULL, NULL)
    ;;
    ;; See https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile
    ;;
    ;; Parameter 1 (rcx): HANDLE to write to
    ;; Parameter 2 (rdx): Address of message to print
    ;; Parameter 3 (r8): Length of message
    ;; Parameter 4 (r9): Write amount of written bytes to this address, null pointer
    ;;                   (Required according to docs when parameter 5 is null, but passing null seems to work just fine)
    ;; Parameter 5 (on stack): Unused optional parameter, null pointer
    ;; Return value (rax): Nonzero on success
    mov rcx, rax
    lea rdx, [msg]
    mov r8, msg_len
    mov r9, 0
    mov qword [rsp   32], 0 ;; We already allocated the shadow space in the prolog and can't use push.
    call WriteFile

    ;; ExitProcess(code)
    ;;
    ;; See https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-exitprocess
    ;;
    ;; Parameter 1 (rcx): Exit code
    mov rcx, rax
    call ExitProcess
    ;; ExitProcess will internally issue the syscall for terminating the process after doing some cleanup
    ;; We messed with rsp in the prolog which would make a `ret` impossible anyway

My main question is about:

  • Shadow space stack alignment

Most resources I found seem to completely neglect this when doing simple "hello world" programs. My program also seems to run just fine when removing all the sub rsp/add rsp statements.

So I'm wondering what the implications of not following those conventions really means for code correctness.

My current understanding is that GetStdHandle/WriteFile/ExitProcess are simply not using shadow space and also do not perform any operation that requires an aligned stack in their current implementation as present on my machine.

However, any update to kernel32.dll is free to change those implementations in a way that relies on shadow space being present and/or an aligned stack.

Therefore code neglecting shadow space/stack alignment for external calls is incorrect in the general sense of incorrectly interfacing with an external API, even though many APIs may be built in a way they can tolerate that slightly incorrect access (but this tolerance is an implementation detail that may change at any time).

-> Is this definition/understanding correct? / Anything to add/clarify?

  • General question

Are there any other apparent mistakes / misunderstandings in the code or comments?

CodePudding user response:

No, those functions probably are writing into their shadow space, your program just doesn't depend on [rsp 0..31] being unmodified across a call

You call ExitProcess instead of ret (because _start might not even have a return address on the stack? IDK if Windows does).

And you don't keep any local vars in your own stack space that becomes shadow space for callees, so they're not stepping on your local vars.

But yes they are tolerating misalignment, if your _start gets called as a function, i.e. with RSP == 8. Unlike in Linux where RSP == 0 at the _start entry point where RSP points at argc, argv[], envp[], not a return address.

Alignment is usually only a correctness problem in code that uses movaps / movdqa to copy 16 bytes at a time, like glibc scanf/printf being common examples on Linux (glibc scanf Segmentation faults when called from a function that doesn't align RSP). On Windows perhaps their DLL functions don't use SSE to copy 16-byte chunks, or they use movups / movdqu (MSVC usually doesn't emit alignment-required mov instructions, even for alignment-required intrinsics like _mm_store_si128 instead of _mm_storeu_si128, making code that's slower on ancient CPUs like K10 or Core2).


Normally you wouldn't sub/add RSP around every function. You'd sub rsp, 40 at the top of a function to reserve shadow space alignment. And add rsp, 40 once at the end, or not if you end by calling a noreturn function instead of using ret. It's normal to not move RSP at all except in the function prologue/epilogue, especially in a calling convention with shadow space.

To store stack args, you'd replace push 0 with mov qword [rsp 32], 0. Or you'd actually start your function with push 0 / sub rsp, 32 so that 0 above the shadow space is there the whole time, and can be an arg for the first function call to take a stack arg. Functions without stack args are required not to touch it, since it's not an arg for them, just part of their caller's stack frame.

  • Related