I wrote a function to pretty print a sudoku, of course this pattern could be generated by some loops but I didn't want to go through the hassle so this is what I came up with (the first 5 format specifiers are just arguments that printf itself pushed to the stack and that will be overridden after the carriage return).
While "it works on my machine", I was wondering whether this would be or could be made portable to work across architectures, compilers, libc implementations, etc?
Of course the assembly code may need some adjustment depending on the target platform, and the number of arguments pushed by printf depend on the current libc implementation.
#define PUSH(x) asm volatile ("push %0" : : "m"(x) :)
#define POP() asm volatile ("pop %%rax" : : : "rax")
void print(void) {
for (uint8_t i = 1; i <= (9 * 9); i) {
PUSH(sudoku[(9 * 9) - i]);
}
printf("%hhd%hhd%hhd%hhd%hhd\r╔═════════╦═════════╦═════════╗\n"
"║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║\n"
"║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║\n"
"║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║\n"
"╠═════════╬═════════╬═════════╣\n"
"║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║\n"
"║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║\n"
"║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║\n"
"╠═════════╬═════════╬═════════╣\n"
"║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║\n"
"║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║\n"
"║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║ %hhd %hhd %hhd ║\n"
"╚═════════╩═════════╩═════════╝\n");
for (uint8_t i = 0; i < (9 * 9); i) {
POP();
}
}
CodePudding user response:
No.
There are no C standards that define the semantics of inline asm, so different compilers can and do handle it differently. This can result in code that compiles without error under different compilers, but produces subtly different results.
https://gcc.gnu.org/wiki/DontUseInlineAsm
CodePudding user response:
This is super duper broken even for compilers that support the GNU C extensions you're using, full of undefined behaviour that you should expect to break in practice with either -O0
or -O3
. And it's already broken in terms of printing the correct values, since you're not passing any register args.
You're assuming that all args are stack args, which is not true in either mainstream x86-64 calling convention. (Why does Windows64 use a different calling convention from all other OSes on x86-64?).
You're also assuming that whatever the compiler does to get ready for a function call will leave your stack args at the right place. For Windows x64, they go above the 32 bytes of shadow space, but pushing behind the compiler's back will probably happen after it reserves stack space in the function prologue. For x86-64 SysV, there's no shadow space, but RSP % 16 == 0 is still required before a call
. Since you do an odd number of pushes (9*9
is odd), you're likely misaligning the stack.
You leave RSP modified after an asm
statement, which is explicitly not allowed.
... the compiler requires the value of the stack pointer to be the same after an asm statement as it was on entry to the statement. However, previous versions of GCC did not enforce this rule and allowed the stack pointer to appear in the list, with unclear semantics. This behavior is deprecated and listing the stack pointer (in the clobber list) may become an error in future versions of GCC.
The compiler might use 12(%rsp)
to access its own locals, e.g. with -fno-omit-frame-pointer
. This is somewhat unlikely to be a problem in practice; it will probably either use RBP as a frame pointer to address locals in a -O0
debug build, or it will keep local vars in registers at -O1
or higher. But it's super hacky.
You used a "m"
constraint for your push
input, but probably that will use a pointer in some other register, not copy it to a local on the stack. Unless inlined into a caller where the sudoku array is a local it can reference relative to RSP. Normally you'd use "rme"
, but with a register you'd need %q0
to force the qword name of the register (%rax
not %al
), assuming the value has type char
.
As is, you're doing qword loads from presumably char
objects, pushing 7 bytes of garbage. Which is fine as long as it doesn't happen to be near the end of a page with the next one unmapped; the calling convention does require the callee to ignore the high bytes of the arg in memory or a register. In this case, since you're calling printf, the args are int
, and the %hh
conversion defines that it has to modulo / truncate that int
.
You step on the red-zone, 128-bytes below RSP. You'd need to compile with -mno-red-zone
, or start out with sub $128, %rsp
. (Or add $-128, %rsp
to save code-size, allowing an imm8). Inline assembly that clobbers the red zone - there's unfortunately no way to declare a clobber on the red-zone.
You're also assuming that compiler-generated use of the red-zone won't overwrite any of your pushes.
Any of these could become big problems after this function inlines into its caller and GCC schedules some code to interleave some work.
You could write the whole function in asm so you can safely use a loop to copy args to the stack, after setting the first 4 (Win x64) or 6 (x86-64 SysV) as register args. Write the function to take a pointer to an array of char
or whatever, so you can do that once to get something to pass to it.
(There's no need to pop one at a time, just add $8 * n, %rsp
!)
In pure C, some CPP macro hackery can probably expand one macro to nine comma-separated expressions so you can write this with a minimum of fuss, into one call to printf
with all the args properly visible to the compiler. But without having to fully manually expand the loop.