Home > other >  How similar is assembly language to its corresponding machine language?
How similar is assembly language to its corresponding machine language?

Time:10-31

It's very common to listen to, or read comments like: "assembler is practically a machine code, but using symbols instead of direct binary codes".

My question is: "how much truth does such kind of affirmation hold in general?"

CodePudding user response:

If you start with a "plain text" representation of machine code (e.g. with opcodes/numbers replaced with mnemonics, addresses/numbers replaced with labels); then most assemblers also:

a) Allow more complicated expressions in operands (e.g. allow something like "mov eax,(1234*5 6)/7 where the assembler calculates the right value for a "mov eax,882" instruction).

b) Have preprocessors allowing you to write macros, etc. Often this includes conditional code, and sometimes it's powerful enough to allow you to create a new language and/or high level language constructs (e.g. imagine "while" and "endwhile" macros).

c) Can auto-select the most optimal encoding. For example, if the instruction could be encoded with a 32-bit immediate operand or an 8-bit immediate operand that is sign extended to 32 bits; then the assembler might look at the operand and determine if the shorter sign extended encoding will work.

All of these things make a huge difference for source code maintenance - e.g. if you add a few instructions somewhere you don't have to manually re-calculate all the addresses/offsets for call/jump/branch targets and data accesses; you can do a "#define COST_OF_CHEESE 123" in one place to make it easy to change later (without having to find everywhere the value was used); etc.

CodePudding user response:

Here's an example, using y86.

add %rdi, %rsi

Becomes something like

00 01 02

Basically, 00 is the byte representation of the opcode add. When the computer sees an add, it knows to interpret the next two bytes as registers (this is slightly more complex in x86). 01 and 02 are the byte 'names' or encodings for the registers %rdi and %rsi respectively.

Some parts of this example may not be entirely reflective of reality, but this is basically the correspondence between machine code and assembly. Instructions are opcodes 1-5 bytes that are interpreted differently depending on the opcode.

  • Related