I have a C program like:
#include <stdio.h>
struct sa {
char buffer[24];
};
void proceed(const struct sa *data);
static inline void func(struct sa sa) {
proceed(&sa);
}
void test(struct sa sa) {
func(sa);
}
It seems that in the optimal assembly output of test
function, the address of the sa
argument of it can be directly passed to proceed
function, since the proceed
function is guaranteed not to change data
. However, the compiler (both x86-64 clang 14.0 and gcc 12.1, -O3 optimization level) emits assembly like:
test: # @test
sub rsp, 24
mov rax, qword ptr [rsp 48]
mov qword ptr [rsp 16], rax
movaps xmm0, xmmword ptr [rsp 32]
movaps xmmword ptr [rsp], xmm0
mov rdi, rsp
call proceed
add rsp, 24
ret
Note that in the output, the whole sa
struct is copied from [rsp 32] to [rsp]. Why does the compiler not eliminate such copy?
CodePudding user response:
This is pretty clearly a missed optimization bug since it only happens with that extra level of inlining func()
.
You can report bugs on https://github.com/llvm/llvm-project/issues and https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc (GCC bug reports prefer AT&T syntax, so select that in your Godbolt link; it's generally good to include a Godbolt link in a GCC bug report along with the actual code and asm, so it's quick for future readers to check if it's been fixed, or play around with it.)
For GCC, use the keyword missed-optimization
.
since the proceed function is guaranteed not to change
data
No, it's legal to cast away const
because the original pointed-to object is not const
. But it still doesn't need to copy; your function owns its stack arg space, and can let other functions modify the only copy if it wants. (Most calling conventions work this way, including all System V conventions such as x86-64 SysV which is in use here. Also I think Windows x64, where args larger than 8 bytes are passed by non-constant(?) reference to space reserved by the caller, with a pointer in a register, or on the stack if there are 4 or more args before it.)
The caller of test
can't assume it's unmodified, so another call with the same arg would need to re-copy the struct after this returned. Even foo(const struct sa);
would work this way; there's no way to declare / promise that a function doesn't reuse its stack arg space for scratch space or args to tail-calls.
This test-case on Godbolt demonstrates that it's a missed optimization: test
will tailcall with just jmp func
if it's noinline
, not copying any args there. And that non-inline definition of func
won't copy either, just the expected RSP alignment then lea rdi, [rsp 16]
/ call proceed
to pass a pointer to its stack arg.
So adding __attribute__((noinline))
to your func
will result in your test
calling proceed
without copying the arg, with just an extra jmp
in the path of execution. If that's legal, it would also be legal to do that when inlining func
.
struct sa {
char buffer[24];
};
void proceed(const struct sa *data);
__attribute__((noinline))
static void func(struct sa sa) {
proceed(&sa);
}
void test_struct(struct sa sa) {
func(sa);
}
// same as non-inline func()
// void test_struct_direct(struct sa sa) { proceed(&sa); }
# clang (trunk) -O3
# GCC is equivalent but uses sub/add instead of dummy push/pop
test_struct:
jmp func # TAILCALL
func:
pushq %rax # re-align the stack by 16
leaq 16(%rsp), %rdi
callq proceed
popq %rax # clean up the stack
retq
Feel free to shortlink that exact Godbolt link in your bug report, or with something commented or uncommented; it uses nightly builds of GCC and clang so devs will know it's not already fixed. Also feel free to link this Stack Overflow Q&A, but your bug report should be self-contained and point out that the optimization is legal, and that uncommenting __attribute__((noinline))
makes the difference.
(So probably for a bug report, you'd want noinline
commented out, and uncomment the test_struct_direct
version that manually inlines func
.)