I develop a programming language used for a domain-specific problem, and I wanted to ask a question about compiler building: if you create a compiler that intends to generate straight machine code, then do you need to study Assembly in order to implement such a compiler?
If no, what are the alternatives how said compiler can produce binary executables?
CodePudding user response:
If performance is important, you probably don't want to try to generate assembly yourself, unless your domain-specific problem is very simple and specific. Generating efficient asm is much harder than just generating working asm. In a compiler like GCC, optimization passes are more than half the code-base, more than parsing C or even C .
Generate something that an existing optimizer like LLVM can deal with, like LLVM-IR. Write a portable front-end for your language, leave the target-specific stuff and optimization to LLVM, or to GCC's back-end. https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index.html has a tutorial.
Of course, to debug your compiler, you may want to learn some assembly to at least know where to start looking in the IR for wrong-code bugs. And of course you'd have to learn LLVM-IR, which is essentially an assembly language.
Or compiling to C is an old-school technique but still works: optimizing C compilers are widely available. (Historically well know CFortran, and C was originally implemented with CFront which compiled it to C.)
Depending on your domain-specific problem, you might choose to compile to some other high-level language that matches your problem domain. Pick a language that you can target easily, and that has a good optimizing compiler or JIT run-time. e.g. Julia is reputedly good for number-crunching, I think letting you take advantage of parallelism.
C could be a good target if some of its template library functions work well. Ahead-of-time C compilers will make an executable that just depends on some libraries, not a "runtime" like a JVM or something. And can compile to a library you can easily call from most other things: C and C foreign-function interfaces are common in most other language. Depending on your use-case, this may be important.
This method will let you use a C, C , Julia, or whatever debugger to see what the code your compiler generated is doing. So you only need to know that target language.
Understanding assembly concepts can be useful to understand what C undefined behaviour might produce the symptoms you're seeing, in case of compiler bugs like out-of-bounds array access. But with modern tools like clang -fsanitize=undefined
, you can check for many such problems to help verify your compiler.
Also related: Learning to write a compiler
CodePudding user response:
In our conversation with Peter Cordes, we found the answer. When our purpose is to produce runnable machine code in the process of compilation, there are 4 primary ways to achieve it:
- Compile down to Assembly and then assemble it.
- Compile it to another language like C/C and then compile it separately.
- Use some metaprogramming tricks to compile an app once that runs different code as you feed it, basically interpreter with the optimised code installed.
- Use external frameworks. One of the solutions is to use Low Level Virtual Machine project that produces object code from the intermediate representation (LLVM-IR), and then your task would be to design front-end for your language to convert to the LLVM's language, and use a linker to get an executable from the object code that it provides.
I hope this answer will be useful for those who will stumble across this misunderstanding and find our solutions useful.