How does an Assembler work at a hardware level?-CodePudding

I have been reading online about the working of an Assembler but it is quite confusing. To summarize what I have understood so far is the an Assembler is basically a text parser with access to a Look up table to map the Assembly language instructions to the equivalent binary instructions. Am I correct ? If I am, where does this look up table exist in the physical hardware of a CPU.

CodePudding user response：

The CPU executes machine code (a bunch of numbers). Those numbers are typically arranged in a certain way that (to make it easier for CPU to decode) involve various rules for the overall format, which pieces determine the opcode, which pieces determine the instruction's operands, etc.

Assembly language is mostly (not completely - see later) a "plain text" representation of machine code. All of the rules the CPU uses (for machine code) become rules used by the assembler. For example, if the documentation for the CPU that describes machine code for some instructions says "bits 4 to 7 determine which register is used for 1st operand" then the assembler might have a function (or maybe a table) to convert register names into the right values for bits 4 to 7. Similar happens for instruction groups (a function or table to convert the instruction's mnemonic into however many opcode bytes).

All of the stuff used to convert text into machine code (functions, tables, etc) are created by whoever wrote the assembler (to comply with the CPU's documentation for how everything is encoded into machine code). None of this comes from the CPU itself; and most assemblers will run on a completely different CPU (e.g. most 80x86 assemblers can easily be ported to run on ARM or PowerPC or MIPs or..).

On top of this the assembler also has to provide useful error checking and reporting (so that if there's a mistake in the assembly language source code it's easy for a programmer to figure out what is wrong where - e.g. using a nice/descriptive error message with a line number, etc); plus support for preprocessing (macros, etc); plus support for various output file formats (object files to suit different linkers, raw output file types like "flat binary", etc); and directives (to control intended CPU mode, alignment, etc) and a way for a programmer to describe "data that is not code".

All of this other stuff is also created by whoever wrote the assembler.

CodePudding user response：

First, there is an Instruction Set Architecture (ISA) — this is a specification published as text for human consumption, usually by a CPU vendor. This document specifies each and every machine code instruction that is available for programs to use and for processors to implement. An ISA specification goes to the fundamental boundary between software and hardware; to the fundamental agreement (or meeting of the minds) between software programmers and hardware implementers.

As a convenience, the ISA specification may also include a "preferred" or suggested assembly form for each machine code instruction.

An assembler is a program written by people who are using an ISA specification to inform the translation of assembly code into machine code. The mechanism they use to accomplish translation is contained within the program code of the assembler, and may involve a table with pattern matching, or may be done using ordinary programming (e.g. if-then statements), all informed by the ISA specification. There's no one right way to design an assembler.

The translation (assembling) is entirely under control of the assembler program (without consultation of hardware) — consider, for example, that we can run an assembler on Windows x64 that accepts and generates code for ARM Linux — two very different processors are referenced: one is actually running the assembler program, and the other is the intended target of the assembled machine code. So, there is no direct relationship between the processor running the assembler and the generated machine code.

There can be many assemblers for the same ISA. The authors of a particular assembler will publish a specification for their assembly language, which shows how to specify and accomplish the ISA's machine code instructions using their versions of assembly mnemonics and other syntax (like for addressing modes, labels, etc..).

The hardware is also written by people who are using this ISA specification to implement the machine code instructions and all their variations. There may be tables, there may be microcode (which some might consider as lookup "tables" describing the actions to accomplish an instruction). As with the assembler, there are many possible approaches and no one right way to implement an instruction set.

Thus, fundamental to both software and hardware is agreement in Instruction Set Architecture. Software programmers accept that hardware will implement this specification, and hardware programmers accept that software will use this specification.