Writing a compiler with Assembly?-CodePudding

I develop a programming language used for a domain-specific problem, and I wanted to ask a question about compiler building: if you create a compiler that intends to generate straight machine code, then do you need to study Assembly in order to implement such a compiler?

If no, what are the alternatives how said compiler can produce binary executables?

CodePudding user response：

If performance is important, you probably don't want to try to generate assembly yourself, unless your domain-specific problem is very simple and specific. Generating efficient asm is much harder than just generating working asm. In a compiler like GCC, optimization passes are more than half the code-base, more than parsing C or even C .

Generate something that an existing optimizer like LLVM can deal with, like LLVM-IR. Write a portable front-end for your language, leave the target-specific stuff and optimization to LLVM, or to GCC's back-end. https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index.html has a tutorial.

Of course, to debug your compiler, you may want to learn some assembly to at least know where to start looking in the IR for wrong-code bugs. And of course you'd have to learn LLVM-IR, which is essentially an assembly language.

Or compiling to C is an old-school technique but still works: optimizing C compilers are widely available. (Historically well know CFortran, and C was originally implemented with CFront which compiled it to C.)

Depending on your domain-specific problem, you might choose to compile to some other high-level language that matches your problem domain. Pick a language that you can target easily, and that has a good optimizing compiler or JIT run-time. e.g. Julia is reputedly good for number-crunching, I think letting you take advantage of parallelism.

C could be a good target if some of its template library functions work well. Ahead-of-time C compilers will make an executable that just depends on some libraries, not a "runtime" like a JVM or something. And can compile to a library you can easily call from most other things: C and C foreign-function interfaces are common in most other language. Depending on your use-case, this may be important.

This method will let you use a C, C , Julia, or whatever debugger to see what the code your compiler generated is doing. So you only need to know that target language.

Understanding assembly concepts can be useful to understand what C undefined behaviour might produce the symptoms you're seeing, in case of compiler bugs like out-of-bounds array access. But with modern tools like clang -fsanitize=undefined, you can check for many such problems to help verify your compiler.

Also related: Learning to write a compiler

CodePudding user response：

In our conversation with Peter Cordes, we found the answer. When our purpose is to produce runnable machine code in the process of compilation, there are 4 primary ways to achieve it:

Compile down to Assembly and then assemble it.
Compile it to another language like C/C and then compile it separately.
Use some metaprogramming tricks to compile an app once that runs different code as you feed it, basically interpreter with the optimised code installed.
Use external frameworks. One of the solutions is to use Low Level Virtual Machine project that produces object code from the intermediate representation (LLVM-IR), and then your task would be to design front-end for your language to convert to the LLVM's language, and use a linker to get an executable from the object code that it provides.

I hope this answer will be useful for those who will stumble across this misunderstanding and find our solutions useful.