I'm trying to create a minimal 64-bit Windows executable to better understand how the Windows executable format works.
I wrote very basic assembly and C code as follows.
hi.s
section .text
hi:
db "hi", 0
global sayHi
align 16
sayHi:
lea rax, [rel hi]
ret
start.c
extern int puts();
extern const char *sayHi();
void start() {
puts(sayHi());
}
compiled with,
nasm -fwin64 hi.s
gcc -c -ostart.obj -O3 -fno-optimize-sibling-calls start.c
# I will explain the flag
and linked with,
golink /fo r.exe /console start.obj hi.obj msvcrt.dll
# create a console application `r.exe`
# the default entry point is `start`
The program runs fine and prints hi
, but note the gcc
flag -fno-optimize-sibling-calls
. That flag disables tail-call optimizations so that the program always allocates stack space and call
s a function. Without the flag, the program crashes.
This is the disassembled result without tail-call optimization. Not sure why gcc
put a nop
there, but otherwise it's very simple and runs fine.
0000000000401000 <.text>:
401000: 48 83 ec 28 sub rsp,0x28
401004: e8 27 00 00 00 call 0x401030 # sayHi
401009: 48 89 c1 mov rcx,rax
40100c: e8 ff 2f 00 00 call 0x404010 # puts
401011: 90 nop
401012: 48 83 c4 28 add rsp,0x28
401016: c3 ret
...
401020: 68 69 00 90 90 push 0xffffffff90900069 # "hi"
...
401030: 48 8d 05 e9 ff ff ff lea rax,[rip 0xffffffffffffffe9] # 0x401020
401037: c3 ret
This is when tail-call opt is enabled, in which the program crashes.
0000000000401000 <.text>:
401000: 48 83 ec 28 sub rsp,0x28
401004: e8 27 00 00 00 call 0x401030 # sayHi
401009: 48 89 c1 mov rcx,rax
40100c: 48 83 c4 28 add rsp,0x28
401010: e9 eb 2f 00 00 jmp 0x404000 # puts
...
401020: 68 69 00 90 90 push 0xffffffff90900069 # "hi"
...
401030: 48 8d 05 e9 ff ff ff lea rax,[rip 0xffffffffffffffe9] # 0x401020
401037: c3 ret
Now the program doesn't allocate stack space before puts
and simply does a jmp
instead of call
.
I investigated further to see where exactly it jumps when calling puts
.
In the no-tail-call case, the called address 0x404010
in the .idata
section has the instruction jmp QWORD PTR [rip 0xffffffffffffffea] # 0x404000
, and 0x404000
seems to contain the address to puts
.
However in the tail-call case, the called address 0x404000
has 54 40 00 00
which is no meaningful instruction. The debugger says the program segfaults at 0x404003
, so I'm pretty sure the program chokes trying to execute a garbage instruction.
I must be doing something wrong, but I'm not sure which, so could you explain why the tail-call case fails and how to get it work?
CodePudding user response:
The problem was on golink
not correctly handling tail-calls. I searched a while to make GNU ld
link the program with the same options given to golink
.
You can create a console-mode Windows executable by GNU ld
with this command.
ld -o... --subsystem=console object-files...
--subsystem console
or -subsystem=console
also means the same. Use --subsystem=windows
to create a GUI application.
GNU ld
also handles Windows dll
files, so in this case, simply giving ld
a copy of msvcrt.dll
from the system folder worked.