It's very common to listen to, or read comments like: "assembler is practically a machine code, but using symbols instead of direct binary codes".
My question is: "how much truth does such kind of affirmation hold in general?"
CodePudding user response:
If you start with a "plain text" representation of machine code (e.g. with opcodes/numbers replaced with mnemonics, addresses/numbers replaced with labels); then most assemblers also:
a) Allow more complicated expressions in operands (e.g. allow something like "mov eax,(1234*5 6)/7
where the assembler calculates the right value for a "mov eax,882
" instruction).
b) Have preprocessors allowing you to write macros, etc. Often this includes conditional code, and sometimes it's powerful enough to allow you to create a new language and/or high level language constructs (e.g. imagine "while" and "endwhile" macros).
c) Can auto-select the most optimal encoding. For example, if the instruction could be encoded with a 32-bit immediate operand or an 8-bit immediate operand that is sign extended to 32 bits; then the assembler might look at the operand and determine if the shorter sign extended encoding will work.
All of these things make a huge difference for source code maintenance - e.g. if you add a few instructions somewhere you don't have to manually re-calculate all the addresses/offsets for call/jump/branch targets and data accesses; you can do a "#define COST_OF_CHEESE 123" in one place to make it easy to change later (without having to find everywhere the value was used); etc.
CodePudding user response:
Here's an example, using y86.
add %rdi, %rsi
Becomes something like
00 01 02
Basically, 00
is the byte representation of the opcode add
. When the computer sees an add
, it knows to interpret the next two bytes as registers (this is slightly more complex in x86). 01
and 02
are the byte 'names' or encodings for the registers %rdi and %rsi respectively.
Some parts of this example may not be entirely reflective of reality, but this is basically the correspondence between machine code and assembly. Instructions are opcodes 1-5 bytes that are interpreted differently depending on the opcode.