Home > Software design >  How should I correctly learn GNU assembly?
How should I correctly learn GNU assembly?

Time:05-10

I'm learning assembly in ubuntu recently, I found some of assembles, like NASM, MASM, and GAS. Their syntax is different(particularly pseudo-directives), NASM and MASM support Intel syntax, GAS support AT&T Syntax.

Now, I want learn AT&T assembly, so, I use the GAS assemble and I found a series of about GAS manual, but it's painful to read for me, like lack of specific examples or just simply introducing basic syntax.

I found representative partial "guides":

Like this example https://sourceware.org/binutils/docs/as.html#Scl, I want to know what storage-class value we have? gccs-assembly-output-of-an-empty-program-on-x86-win32 tell me that .scl 2 means external storage class, but I can't find anything else about this value. and so on.

Maybe I am wrong in learning direction, or I found information wrong way. So I want to ask you that have some of guide or manual about AT&T with examples and clear explanation for beginner.

I am confusing that pseudo-directives, what is it like the standard assembly program structure?

CodePudding user response:

When learning assembly, you can mostly focus on the instructions, and .section and .globl directives. Unless you're trying to learn how GAS directives produce metadata, including debug info and other stuff that's useful for debugging high-level languages moreso than hand-written asm.

A lot of debug-info directives like apparently .scl only have their syntax documented, no real details on what the values mean or what other things might care about what value you put there.

You can write working hand-written asm to play around with using pretty much only .section and .globl. (And for static data, .byte / .short / .long / .quad and .ascii / .asciz for initialized, .space in the BSS, and .p2align if you need it).

That's why Matt Godbolt's "compiler explorer" site filters out directives by default, except for data initializers, because of course data is in .data or .rdata, and code in .text, and the interesting part about compiler asm output is the actual instructions (and static data). See How to remove "noise" from GCC/clang assembly output?


Being fully compliant with Windows expectations for SEH metadata for stack unwinding (especially in 64-bit code) may take some extra directives, same for x86-64 SysV .cfi stack-unwind metadata. But that's something you can worry about after you understand the basics of assembly, if you ever need to use hand-written asm in a robust production-quality context, rather than just as a one-off experiment to learn how instructions work.


https://stackoverflow.com/tags/x86/info has some links to tutorials (and manuals), and Programming from the Ground Up is a good free book for 32-bit x86 with AT&T syntax. (And GAS directives). It's aimed at running on Linux, so it can teach some OS / computing concepts along the way, the kind of background knowledge necessary for assembly (and system calls) to make sense. Online HTML version

To follow it on a modern 64-bit GNU/Linux distro, you may need as --32 and ld -m elf_i386 to override the defaults to 32-bit. And gcc -m32 -fno-pie -no-pie anywhere the book says gcc.

You might also want -fno-stack-protector to further simplify the asm output from C, if you're comparing book examples of how C compiles. But be aware that different GCC versions will compile differently, especially as default tuning options have changed over the years. -mtune=pentium or -mtune=pentium3 might also get GCC to choose code-gen strategies that are more like an old book. Of course, current GCC's choices are also correct, and more appropriate for newer CPUs, just different from old GCC!

Also the i386 SysV ABI as used on Linux has changed to require 16-byte stack alignment before a call instruction, due to GCC accidentally making 32-bit code e.g. using movaps that relied on that performance optimization GCC was choosing to do. Calling libc functions will usually still happen to work in 32-bit code with only 4-byte stack alignment, but if you see modern GCC reserving more space than it needs, that's usually why.


Nitpick:

I found some of assemblers, like NASM, MASM, and GAS. Their syntax is different(particularly pseudo-directives), NASM and MASM support Intel syntax, GAS support AT&T Syntax.

Directives are orthogonal to instruction syntax, and there are different flavours of Intel syntax, especially between MASM vs. NASM. (As well as major differences between MASM and NASM for directives).

Also, GAS also supports .intel_syntax noprefix to use a somewhat MASM-like instruction syntax, but still GAS directives.

(Similarly, YASM has an AT&T mode to use AT&T instruction syntax, but still NASM/YASM preprocessor and directives.)

  • Related