What implements the stack in a typical process' memory?-CodePudding

I have always been confused about where 'the stack' is implemented. I know about the typical memory layout for a process (at least on Unix-like systems) but I've always wondered what actually sets the structure of that layout. Is it the operating system? The compiler? If it is, then I don't understand how the x86 ISA can have a push instruction; wouldn't this mean that some kind of stack must exist before any OS is even loaded?

CodePudding user response：

In mainstream OSes, the OS maps some stack memory and enters user-space with RSP pointing near the top of that mapping.

But that's a design choice, not a necessity. An OS could require user-space processes to lea rsp, [rel top_of_some_bss_array] or whatever early in _start, before running any instructions that use the stack. (And before installing any signal handlers or anything else that could asynchronously use RSP.)

(In a protect-mode or 64-bit OS, the kernel normally sets things up so hardware interrupts use a separate kernel stack, not the user-space stack pointer, for security reasons. But if not, or in 16-bit DOS, having a valid stack pointer at all times was important unless you disabled interrupts. )

For example in Linux: Analyzing memory mapping of a process with pmap. [stack] discusses how the initial stack grows, but how that magic only works for the main thread's stack; new threads in the same process do need you (or the thread library) to manually mmap some space. But the thread-creation system call, clone, takes an arg for what to set the new thread's stack pointer to. So the API is still designed around every task (thread) having a valid stack pointer before it starts.

Also related: Beginning of stack on Linux - the Linux kernel randomizes the initial stack pointer. It also uses stack space to pass argc, argv, and envp to user-space, along with storing the pointed-to arg and environment strings.

Instruction can exist that require some setup to use.

For example, at power-on in an x86 CPU, rsp holds 0, or at least esp on Intel; see comments. (The machine boots in 16-bit unreal mode, so only ss:sp is initially relevant). A push would wrap to ss:FFFE.

The BIOS code at the reset vector should set ss:sp to point somewhere before running any push or pop, or call/ret, or enabling interrupts (which will asynchronously use the stack). I assume the system boots with IF=0 because software won't yet have stored an interrupt table and used lidt. (This code I'm talking about is in the BIOS itself, before your own code could run via UEFI or as a legacy BIOS MBR bootloader.)

Similarly, the existence of xlat doesn't imply that RBX is always a valid pointer. Don't put an xlat in your code where it will run when RBX isn't a valid pointer, though!
Or just don't use it, since it's not very fast. Same for loop, RCX isn't always a valid loop counter. And again, it's not fast except on recent AMD CPUs; prefer dec ecx / jnz.

For convenience and efficiency, it works much well to have the OS just provide a stack mapping, instead of requiring the process to allocate its own stack in the BSS, or with an mmap system call or something. It's assumed that every process will want a stack, so might as well just have that set up ahead of time before entering user-space.

CodePudding user response：

If we think of say windows, linux, macos, etc. The operating system will have rules for binary file formats, it will have a loader that loads the binaries that it supports and it may also be known for example does the application need to initialize .data and .bss or will the operating systems loader zero .bss for you based on entries in the binary?

But also the memory space for a typical application.

Then when you or someone builds say for example gcc or llvm toolchain for this operating system and target (x86, arm, etc) it is also building knowing the target operating system, defaulting, ideally, to the preferred binary format. The gnu c library will be specific to that operating system and so on.

So while it may seem that I just download a pre-built windows gnu toolchain or I download a pre-built linux toolchain and they both have gcc and other tools. Those two builds are quite different with respect to the backend of the C library and how the binaries are built (linker script, etc).

The stack space is ultimately your responsibility as the programmer, as well as heap, .text, etc. And if you are doing bare metal for an mcu you may or may not see this (a lot of folks use the canned vendor stuff and here again do not see what is going on). But almost every one uses pre-built toolchains or even if you build your own say gnu or llvm from sources, you will still likely get the defaults prepared for you by someone else that match the operating system environment.