Home > Net >  Computer architecture and compiler
Computer architecture and compiler

Time:01-08

I got a doubt based on my understanding of following assumption, 1. For every CPU architecture the assembly language set will differ.

so, my doubt is how the common compiler/interpreter is converting it into .asm code because not all computers have same architecture!

Answer for this I guess that "While we are installing compiler in our system itself it gets to know about the computer architecture it's getting installed on and so it will convert C /python/java to that corresponding CPU .asm code"

So, if my above guess is correct then I understand while they design compiler the developer needs to integrate all CPU architecture's assembly set into compiler.

Note: My doubts would be silly, I never done any compiler course, am a ECE grad. Thanks in advance ;p

CodePudding user response:

assembly language is not necessarily where you need to focus.

each processor has an architecture including in particular the instruction set. think instruction set, machine code, not assembly language, as there are countless examples of incompatible assembly languages for the same architecture. At the end of the day you need machine code, instructions.

a compiler has at least one input and one output, for example a C compiler may turn C into asm or it may turn C in to machine code. or maybe it turns C into java bytecode, or some other bytecode. languages are a big problem here as interestingly C which is difficult to standardize and is packed wiht implementation defined items, is actually converging where pyton, rust, etc change/diverge over time for some reason. In any case, you have various compilers and various goals. It is certainly a case of I want to get from ideally a higher level language to a lower level language. But you may have a compiler that is from its inception designed for one target (isa) and from front to back aims to optimize its output for that target. Where others like gcc might have started that way, I dont know but now is very much a front, middle and back. The front essentially parses the language to be compiled, C for example. It generally gets turned into some internal code or data structures adding two variables and storing in a third may turn into three variable allocation steps of some size for each, and then the operation get this operand get that operand add them then store the result. Very much like turning the high level into an assembly language. Generally the bulk of the optimization happens at this layer where you have these generic operations. Then you have your backend, that turns this middle code/data into target specific code. Ideally some optimization, sometimes you hear the term peephole optimizer, assembly is generally not optimized that would cause serious problems but compiled code, generically can still get some target specific optimization. some instruction sets can add small numbers using an immediate others you may need to load that immediate into a register then do the operation, so you could save an instruction and possibly a register if you optimize that small number into the operation. post increment, decrement branch if zero, etc. all of this is thought of as a compiler. the output at this point does not have to be assembly language, imo that is teh sane way but, it can be machine code with some other data to help the linker. If the compiler is designed to make objects to be linked later as one step in a toolchain.

So compilers like gcc, not only have a front a middle and a back, gcc in particular can share the middle and back with other languages you can have it parse java or d language, etc and then optimize in the middle then the backend to the target. And others are single language, single target, and everything in between.

That is all great and for many languages it is system independent, gcc will take C and turn it into asm or objects for you, independent of what operating system you intend to use, or baremetal. It is when you start to link things and what libraries you link with that you get into a target operating system. The same target, x86 for example, is not assumed to have the same system call structure, not necessarily even assumed to use the same system call mechanism for macos vs windows vs linux for example. So you would need a C library that on the front side is generic per the common C library calls but as it gets closer to the system then it makes system specific calls. What file format and the rules/properties for that file format are defined by the operating system, so we know EXE for windows and elf and others for linux. even if the same target instruction set.

To be successful in getting a gnu toolchain (gcc plus binutils) and C library (glibc) so that you can successfully build a COMMAND LINE program, there are a LOT of moving parts. And when you take a pre-built gnu toolchain for example for x86 for windows. It is going to have been built know the preferred file format, it will have a c library built for the operating systems system calls. Linker scripts associated with the C library and its bootstrap, etc.

As mentioned in comments some compilers are cross compilers or can be. The compiler binary itself can be an x86 linux program for example, but the output is arm instructions for example. Some toolchains are architected in a way that run-time they can if-then-else their way to various targets with one compiler binary. llvm is designed this way-ish. gnu tools are designed that at compile time, the time you compile the toolchain itself, you pick the target and a laundry list of options and then the binary is built to match those options. so if you want a gnu mips C compiler and an arm C compiler you need to build gcc/binutils/glibc two times and install them to different directories.

so yes if you are asking about compilers you need to start with the high level language you want to compile, eventually you need to know the target instruction set and operating system and the rules for all three. Then you architect the compiler to output functional code (never assume that any two compilers or compiler versions will generate the same output from the same input) per the rules of the input language, and the rules of the target instruction set and maybe not at compile time but as more of a system level (libraries, file formats, etc) a file format and system calls for the target operating system.

turning one language into another is only the first step in a list of steps to a useful tool.

  • Related