I am struggling quite a lot with assembly for mac os (x86_64 architecture). I would like to walk you through the explanation of a hello world program and I would appreciate if you could give me your feedback with suggestions and explanations: having said that let’s jump into the code.
Hello world program
Never felt the pain of an Hello world before. So, this is the code that I have copied and pasted from the internet:global _main
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
mov rsi, str
mov rdi str.len
syscall
mov rax, 0x2000001
xor rdi, rdi
syscall
section .data
str: “Hello world”,
.len: equ $ - str
So let me embarass myself:
global _main is telling basically the linker where to start if I am not mistaken
Section .text is telling the OS (I guess) that this is the beginning of the actual program.
_main if I am not wrong is a function and this seems to be the notation for functions
mov rax, 0x2000004 : I do not understand what this thing does. I looked up on the internet how a syscall works and it basically needs a file code (I think this is the 1 on the next line), a pointer to a buffer (where is exactly this buffer, i think points to the first byte of my string) and the length in bytes of the piece of text (in this case .len). My question is when I need to write something, how does this hexidecimal business work and what is the actual job of the mov rax instruction.
mov rdi, 1: I am still not getting what is actually happening. We need a 1 to set output to stdout, but what is the actual function of this instruction, where is this 1 going, what is happening behind the scenes.
Then we have this str.len which I do not quite understand, what is this .len notation?I get that this gives the size of the string, but how can we write it like this?
syscall: this function seems like black magic, and I know that the Os is doing some dirty tricks but I am pretty ignorant of OS’s and so I can’t get what is this thing doing.
mov rax, 0x2000001: now we need to exit the program, again why do we need to load into a register this hex number (yes I know this is the command to exit but again, what is actually happening).
xor rdi, rdi: this is probably the only bit that I get, we are setting to 0 the content of the rdi register by xoring the same two values.
syscall: this is black magic
str: “Hello World”: I get this :)
.len: I do not understand this .notation. I think that $ means “address of here” or at least this is something I looked up, and I think it is correct. As you can see there are many gaps in my understanding and I am sorry if this things can sound trivial to some of you, but I could not really find any kind of book, website, documentation that offers one of those “quickstarts”, “tutorials”, “intros”.I am jumping from one blog to the other and the code seems always to look different :) .
CodePudding user response:
- No, that just exports the symbol.
- No, that tells the assembler which section to put the following stuff into.
.text
is a default section for code. - No, that's a label. Function entry points are usually denoted by labels, but not all labels are functions.
- On MacOS the value
0x2000004
is the code that specifies you want awrite
system call. The OS will look inrax
to determine what the caller wants. All system services have a code. You can imagine the OS doing something likeif (rax == 0x2000004) do_write(rdi, rsi, rdx);
rdi
is a register. You know the registers, right? Similarly to point #4 above, the OS once it determined you wanted awrite
will checkrdi
for the destination file descriptor.str.len
is just a label syntax. The value is defined at the bottom. This should be loaded intordx
notrdi
though.- It transfers control to the OS. Which then look at the contents of the registers and performs the action requested. The OS is just code, albeit privileged.
- Yes,
$
is the current location, which is the end of the string. So subtracting the start of the string will give you the length. The leading dot is just a special label which instructs the assembler to prefix it with the nearest previous non-local label, in this casestr
. So that's equivalent to writingstr.len
.