Is implementing a direct Assembly macro in a programming language a good idea?-CodePudding

I'm making a compiler for a programming language, and because I needed to directly write Assembly code in my language for my tests, I created a simple macro @ to directly push code into the Assembly output.

func main
    @ mov rbx, 62
    ret

And the generated output will be (I added comments to understand) :

main:
    ; (generated by the compiler) begin of the function
    push rbp
    mov rbp, rsp

    ; function content
    mov rbx, 62 ; copied from the source code
    
    ; (generated by the compiler) return   end of the function
    mov rax, 0
    mov rsp, rbp
    pop rbp
    ret

Should I really keep that in the language, or should it be used only for my tests ? Because I don't know if there will be security problems or things like that with this macro. You can do whatever you want with this macro, it will be pushed as this in the output file. For sure, I will add a syntax and a compatibility checker to be sure that there will be no Assembler errors.

CodePudding user response：

TL:DR: your proposal is exactly equivalent to GNU C Basic Asm (inline asm without operands), which is essentially useless in a language like C. It can't be used safely with an optimizing compiler.

This does not appear useful unless "normal" code already has to know what registers are being used, e.g. if your language is basically already assembly, not portable. (The part you showed looks like assembly, with a label and a ret).

See also Assembly - Are there any languages other than C and C that allow for interaction with Assembly using inline code? re: design of inline asm support in various languages. If your language isn't itself basically a macro-assembler, but an actual compiler that will use registers itself, you need some way to have inline asm not step on the compiler's toes in terms of register usage.

What you're picturing would be basically a "domain-specific language" where your compiler has to know how every instruction uses registers (see Rust's inline asm design discusson), to know if it'll still have its value later, and if it needs to save/restore a call-preserved reg to not violate the calling convention.

e.g. your example violates both Windows x64 calling convention and x86-64 System V ABI by returning with a modified RBX. If there's no way for the user to avoid this, your syntax is unusable by real code.

Also, even for instructions like mfence to be useful, you need to be able to order them wrt. compiler-generated code. If your compiler doesn't currently optimize at all, then that happens by default, but if you want to think about language design, you need to think about how a compiler could optimize it.

See discussion on Why can't local variable be used in GNU C basic inline asm statements? - in GNU C inline asm, something like asm("cli") won't step on any registers, but as @R points out:

That's still not correct because it's nor ordered with respect to the memory operations you want to protect by masking interrupts. Basically, "basic asm" is just "always a bug".

Since then, I updated my answer on that Q&A to agree with that, that GNU C Basic Asm should never be used, except at global scope (as an alternative to a separate .s file) or in a __attribute__((naked)) function where again you're writing the whole function body in inline asm, not mixing with compiler-generated asm.

Your suggestion to allow dropping raw text into the compiler's asm output is exactly what GNU C inline asm is, except that GNU C inline asm is really just string processing (for extended asm) or just a flat string (basic asm). If you emit invalid asm syntax, it won't be detected until the assembler sees it. (clang uses its built-in assembler on asm template strings separately, so it does detect at compilation time, unlike GCC which does compile separately from assembling.)