How does main() receive command line arguments?-CodePudding

With the standard C entry:

int main(int argc, char *argv[])
{
    // stuff
}

How is argv populated? The compiler has no idea what size to allocate for an array and I would assume the OS is the entity responsible for passing the additional arguments to the program, but how are they passed to main? Where is the array of pointers initialized? Is that function created by the compiler and then injected into the program launch sequence?

This is something I've always just taken for granted, and I got to thinking about it on a problem today, that I wasn't really sure how the additional arguments are eventually received by main, let alone given to any program such as in CPython as sys.argv.

Bonus: How does the OS handle command line arguments? Clearly the CLI (or shell) knows how to parse the string sequence, but how are the additional arguments "fed into" the executable? Does the compiler add some functionality to to just read from stdin (which is a buffer) and parse the parameters accordingly before passing to main?

CodePudding user response：

Let's take Linux x86-64 as an example.

When a process calls execv("/my/prog", args), it makes a system call to the kernel. The kernel uses the args pointer to locate the argument strings in the process's memory, copies them somewhere else for temporary safekeeping, and then tears down the process's virtual memory. Then it sets up the virtual memory for the new program, and loads its code and data from its binary /new/prog (actually it just maps it for demand loading, but that's not important).

It also allocates a block of memory to be the new program's stack, and that's where it copies the command line arguments, as well as the environment variables and various other data that needs to be passed to the new program. Here it also sets up the array of argv pointers, pointing to the strings themselves in the program's stack memory, and pushes the argument count on the stack as well. The precise layout is specified in the ABI, see Figure 3.9.

Now to actually start the program. The binary's header specifies an address to be used as an entry point. The linker will have arranged that this points to a special piece of startup code. This code usually comes with your standard C library, in an object file with a name like crt0.o. It has been written in assembly, and its job is to process the command line arguments and so forth, set up registers and memory the way that compiled C or C code expects, and call a C/C function in the standard library which will do further initialization and then call your main. The kernel jumps to the entry point address, switching to unprivileged mode along the way, and the startup code starts executing.

You can see glibc's version in start.S, but a very minimal version could look something like this.

; main takes argc in rdi and argv in rsi

; bottom of stack contains argument count
mov rdi, [rsp]

; next is start of the argument pointer array
lea rsi, [rsp 8]

call main

; main returns, exit the program
mov rdi, rax
call exit
; exit() makes an exit system call and doesn't return

So when control actually reaches your main function, the registers contain the same values as if it had been called by another C function. The argv argument points to an array of pointers on the stack, each of which points to a string located further up in stack memory, as set up by the kernel.

CodePudding user response：

How is argv populated?

The language implementation (by which I mean everything beyond, such as the shell, the operating system etc.) takes care of it.

I would assume the OS is the entity responsible for passing the additional arguments to the program

Pretty much, yes.

Where is the array of pointers initialized?

Somewhere that the language implementation chose to initialise them.

Bonus: How does the OS handle command line arguments?

There is no one OS. There are many, and each do their own thing. Some of them are open source, so you will be able to study them.