I'm looking at an example of Intel assembly being assembled by NASM. It has the instruction:
add byte [ebx], 32
How would I know from the documentation what "byte" does?
The book I'm reading explains in the text how "byte" tells the assembler that we're only writing a single byte to ebx
. It's not clear to me how I'd know this from looking at the documentation.
From examples in the book and elsewhere, it looks like the ADD
instruction has two forms:
ADD <dest> <src>
ADD <size> <dest> <src>
However, when I look at the Intel documentation[1], I don't see anything that looks like either of my forms. Each of the instructions given in the table have only a single comma which, to me, makes it seem like all the corresponding opcodes take only two inputs. There is a table that gives "Instruction Operand Encoding". Operands 3 and 4 NA. Looking around the web, most sites don't mention anything about the size parameter (let alone if it applies to my processor).
I'm assembling on an Intel(R) Core(TM) i7-6700HQ CPU in 386 mode:
nasm -f elf -g -F stabs -o $OBJECTFILE $1
ld -m elf_i386 -o $BUILDNAME $OBJECTFILE
Maybe the instruction takes an extra operand for 386 but not for newer architectures?
[1] "Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4", Vol. 2A 3-31 page 605 in the pdf.
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
CodePudding user response:
The byte
keyword in the asm source sets the operand-size attribute of the instruction. In machine code, that would be implied by the numeric opcode, or for 16/32/64-bit operand-size, by the current CPU mode and prefixes for the non-byte opcode. Intel's manual documents the machine-code forms, not asm source syntax.
See the following re: how that gets encoded into machine code.
- Is there a default operand size in the x86-64 (AMD64) architecture?
- How does x86 handle byte vs word addressing when executing instructions and reading/writing data?
- How to encode an instruction when we just know the hex for opcode
This is why assemblers have manuals, too, separate from the ISA manual.
For example, NASM's manual, Chapter 3: The NASM Language
3.1 Layout of a NASM Source Line describes the syntax layout of a mnemonic and operands. Unfortunately that section neglects to mention the overrides you can put in front of operands, only the prefixes like o16
you can put in front of the mnemonic! (As a clunkier manual way to specify the operand-size.)
The manual does has examples of usage of operand-size overrides in multiple places, e.g. in 2.2 Quick Start for MASM Users it points out that NASM needs mov word [var], 2
even if var is var dw 0
which in MASM would magically imply an operand-size for that instruction. And mention of the same specifiers when used with strict
to force the encoding of the immediate, not just the operand-size. e.g. add ecx, strict dword 123
forces the add r/m32, imm32
form, while add ecx, dword 123
still allows the add r/m32, imm8
form. (https://www.felixcloutier.com/x86/add)
Some other x86 assemblers, like GAS and clang/LLVM, by default use AT&T syntax that's very different from what Intel manuals use to talk about instructions, where operand-size is specified (if needed) by a suffix on the instruction mnemonic, like movb $'a', (%rdi)
instead of MASM mov byte ptr [rdi], 'a'
(note the extra ptr
keyword) or NASM mov byte [rdi], 'a'
Assembly syntax depends on the tool, not the ISA. Intel's manuals, especially vol.2, the part that lists every available instruction, do not specify the syntax details of how to specify operand-size in asm source when it would be ambiguous.
In asm source, a register can imply operand-size
For instructions where both operands must be the same size, a register operand implies the operand-size in asm source syntax, so you don't need to specify it. e.g. add eax, [rdi]
doesn't need to be add eax, dword [rdi]
.
But mov-immediate to memory (or any other op mem,imm
instruction) are ambiguous, as are one-operand memory instructions like inc [mem]
, and the rare instructions where operands don't have to be the same size like shl [rdi], cl
(destination size could be b/w/d/q) or movzx eax, [rdi]
(source size could be byte or word)
See When do I need to specify the size of the operand in Assembly?
Good assemblers like NASM will error on that ambiguity. Less-good assemblers will sometimes just pick a default. e.g. GAS picks dword for instructions other than MOV, e.g. add $1, (%rdi)
, and only recently even added a warning about that!
Similarly, [rdi rax]
specifies 64-bit address-size, while [edi eax]
would be 32-bit address-size. The default address-size (in asm source) for something like [1234]
is the bitness of the current mode, i.e. not using a 67
address-size prefix in the machine code.
Again, this is 100% about asm source-level syntax. Encoding an instruction into machine code for a certain mode necessarily implies an operand-size.
That's why you need to tell the assembler what mode the CPU will be decoding in. e.g. with NASM bits 32
if you're making a flat binary or switching modes in a bootloader. Or more normally by assembling with nasm -felf64
to make a 64-bit object file. In that case, bits 32
would let you put mismatched machine code into the wrong object file, instead of causing an error at assemble time from push ebx
not being encodeable for 64-bit mode.