Out of curiosity how MacOS prepares its stack, I wrote an (x86_64) assembly program to print the top of the stack to stdout right when a process gets started:
global start
start: ; entry point of the binary, called by the loader
push rsp ; push the stack pointer to stack so that we'll se that too
mov rdi, 1 ; file to write to: file descriptor 1 (STDOUT)
lea rsi, [rsp] ; source of the write: stack
mov rdx, 64 ; number of bytes to write: 64 (8 x 64-bit integers)
mov rax, 0x02000004 ; MacOS syscall number for write
syscall
mov rsi, [rsp 16] ; smoke test: argv contents
mov rdx, 16 ; we expect the argv[0] ("./inspect_stack\0") to be 16 bytes long
mov rax, 0x02000004
syscall
mov rsi, [rsp 32] ; another smoke test: envp???
mov rdx, 11
mov rax, 0x02000004
syscall
mov rax, 0x02000001 ; MacOS syscall number for exit
syscall
Running this program and inspecting the output:
nasm -f macho64 inspect_stack.asm && ld inspect_stack.o -static -o inspect_stack && ./inspect_stack | xxd -e -g 8 -c 8
I see something like this: (added some comments of my own)
00000000: 00007ff7bfeff6b0 ........ # this is the stack pointer we pushed
00000008: 0000000000000001 ........ # argc
00000010: 00007ff7bfeff880 ........ # argv; see the smoke test result
00000018: 0000000000000000 ........ # a null pointer???
00000020: 00007ff7bfeff890 ........ # are these part of envp?
00000028: 00007ff7bfeff89f ........ # ...seems like an array of pointers stored inline?
00000030: 00007ff7bfeff8dc ........ # ...and they seem to point at a continuous buffer
00000038: 00007ff7bfeff8ed ........
00000040: 636570736e692f2e ./inspec # the result of the 1st smoke test. yes, argv[0]!
00000048: 006b636174735f74 t_stack.
00000050: 6573552f3d445750 PWD=/Use # the result of the 2nd smoke test... seems like envp?
00000058: 2f7372 rs/
So, I had an understanding that there would be a 64-bit integer (argc) and two pointers (to argv and to envp) stored to the stack at the start of the program. However, this doesn't seem to be true, or then the envp pointer is null for some reason. However, we can see that the envp array, stored inline, seemingly starts after the null. What's the actual layout of the stack when the process starts?
CodePudding user response:
Inspecting a bit more, and adding more arguments, I noticed that my understanding that there would be two pointers to argv and envp at the top of the stack, was mistaken. Instead, argv and envp are stored inline, as arrays of pointers to the associated strings. Both arrays are null-terminated, so the null value I was seeing was actually the terminator of argv. Adding more arguments makes this a lot clearer:
nasm -f macho64 inspect_stack.asm && ld inspect_stack.o -static -o inspect_stack && ./inspect_stack first second | xxd -e -g 8 -c 8
00000000: 00007ff7bfeff698 ........
00000008: 0000000000000003 ........ # argc
00000010: 00007ff7bfeff878 x....... # argv[0]
00000018: 00007ff7bfeff888 ........ # argv[1]
00000020: 00007ff7bfeff88e ........ # argv[2]
00000028: 0000000000000000 ........ # argv end
00000030: 00007ff7bfeff895 ........ # envp[0]
00000038: 00007ff7bfeff8a4 ........ # envp[1] and so on
00000040: 636570736e692f2e ./inspec
00000048: 006b636174735f74 t_stack.
00000050: 5000646e6f636573 second.P # the second smoke test now sees argv[2]!
00000058: 3d4457 WD= # seems that the envp strings are located right after argc strings
TL;DR: I thought that the second and third 64-bit values in the stack were char **argv
and char **envp
. Instead, they were argv[0]
and argv[1]
. Now, to get char **argv
that C main
would expect I could take [rsp 8]
(8 bytes for skipping argc
), and to get char **envp
I could mov rax, [rsp]
and then take [rsp 8 rax*8 8]
(8 bytes for skipping argc, then skipping argc number of pointers, and finally another 8 bytes for skipping the null terminator).