I am doing some experimentations on x64 assembly instructions, using the Miasm framework. Consider the snippet below, where I disassemble and reassemble the bytecode of LEA RAX, [RIP 1]
:
from miasm.analysis.machine import Machine
machine = Machine("x86_64").mn
ins = machine.dis(b"\x48\x8d\x05\x01\x00\x00\x00", 64)
print(ins)
>>> LEA RAX, QWORD PTR [RIP 0x1]
machine.asm(ins)
>>> [b'J\x8d\x05\x01\x00\x00\x00', b'K\x8d\x05\x01\x00\x00\x00', b'H\x8d\x05\x01\x00\x00\x00', b'I\x8d\x05\x01\x00\x00\x00', b'fH\x8d\x05\x01\x00\x00\x00', b'fI\x8d\x05\x01\x00\x00\x00', b'fK\x8d\x05\x01\x00\x00\x00', b'fJ\x8d\x05\x01\x00\x00\x00']
for i in machine.asm(ins):
print(machine.dis(i, 64))
>>> LEA RAX, QWORD PTR [RIP 0x1]
>>> LEA RAX, QWORD PTR [RIP 0x1]
(...)
>>> LEA RAX, QWORD PTR [RIP 0x1]
My questions are: why exactly are there so many bytecodes that correspond to the same instruction, in which way do they differ? Is there any difference at all if I use one instead or another? My goal is to write a Python script to automate the generation of a rather complex assembly source file, so I'd like to double check that I won't have issue because I "choose" the wrong one.
CodePudding user response:
Refer to the Intel Software Development Manuals for details on the instruction encoding.
What you can observe here is that the instruction begins with a REX prefix to indicate that the data width is 64 bit. This REX prefix encodes 4 bits (the R, E, X, and W bits), but only the R bit (which must be clear to select RAX
instead of R8
) and the W bit (which must be set to select 64 bit operation instead of 32 bit operation) are relevant. The other two bits configure base and index register, but your memory operand doesn't have them.
So whatever you set these bits to, the result will come out to be the same. This is why four possible encodings are shown.