Home > Back-end >  step by step hello world in assembly for mac OS
step by step hello world in assembly for mac OS

Time:11-06

I am struggling quite a lot with assembly for mac os (x86_64 architecture). I would like to walk you through the explanation of a hello world program and I would appreciate if you could give me your feedback with suggestions and explanations: having said that let’s jump into the code.

Hello world program

Never felt the pain of an Hello world before. So, this is the code that I have copied and pasted from the internet:
global _main 
section .text

_main:
   mov rax, 0x2000004
   mov rdi, 1
   mov rsi, str
   mov rdi str.len
   syscall

   mov rax, 0x2000001
   xor rdi, rdi 
   syscall

section .data
str: “Hello world”, 
.len: equ $ - str

So let me embarass myself:

  1. global _main is telling basically the linker where to start if I am not mistaken

  2. Section .text is telling the OS (I guess) that this is the beginning of the actual program.

  3. _main if I am not wrong is a function and this seems to be the notation for functions

  4. mov rax, 0x2000004 : I do not understand what this thing does. I looked up on the internet how a syscall works and it basically needs a file code (I think this is the 1 on the next line), a pointer to a buffer (where is exactly this buffer, i think points to the first byte of my string) and the length in bytes of the piece of text (in this case .len). My question is when I need to write something, how does this hexidecimal business work and what is the actual job of the mov rax instruction.

  5. mov rdi, 1: I am still not getting what is actually happening. We need a 1 to set output to stdout, but what is the actual function of this instruction, where is this 1 going, what is happening behind the scenes.

  6. Then we have this str.len which I do not quite understand, what is this .len notation?I get that this gives the size of the string, but how can we write it like this?

  7. syscall: this function seems like black magic, and I know that the Os is doing some dirty tricks but I am pretty ignorant of OS’s and so I can’t get what is this thing doing.

  8. mov rax, 0x2000001: now we need to exit the program, again why do we need to load into a register this hex number (yes I know this is the command to exit but again, what is actually happening).

  9. xor rdi, rdi: this is probably the only bit that I get, we are setting to 0 the content of the rdi register by xoring the same two values.

  10. syscall: this is black magic

  11. str: “Hello World”: I get this :)

  12. .len: I do not understand this .notation. I think that $ means “address of here” or at least this is something I looked up, and I think it is correct. As you can see there are many gaps in my understanding and I am sorry if this things can sound trivial to some of you, but I could not really find any kind of book, website, documentation that offers one of those “quickstarts”, “tutorials”, “intros”.I am jumping from one blog to the other and the code seems always to look different :) .

CodePudding user response:

  1. No, that just exports the symbol.
  2. No, that tells the assembler which section to put the following stuff into. .text is a default section for code.
  3. No, that's a label. Function entry points are usually denoted by labels, but not all labels are functions.
  4. On MacOS the value 0x2000004 is the code that specifies you want a write system call. The OS will look in rax to determine what the caller wants. All system services have a code. You can imagine the OS doing something like if (rax == 0x2000004) do_write(rdi, rsi, rdx);
  5. rdi is a register. You know the registers, right? Similarly to point #4 above, the OS once it determined you wanted a write will check rdi for the destination file descriptor.
  6. str.len is just a label syntax. The value is defined at the bottom. This should be loaded into rdx not rdi though.
  7. It transfers control to the OS. Which then look at the contents of the registers and performs the action requested. The OS is just code, albeit privileged.
  8. Yes, $ is the current location, which is the end of the string. So subtracting the start of the string will give you the length. The leading dot is just a special label which instructs the assembler to prefix it with the nearest previous non-local label, in this case str. So that's equivalent to writing str.len.
  • Related