Home > Back-end >  Why do clang and gcc produce this sub-optimal output (copying a struct) for passing a pointer to a b
Why do clang and gcc produce this sub-optimal output (copying a struct) for passing a pointer to a b

Time:07-06

I have a C program like:

#include <stdio.h>

struct sa {
    char buffer[24];
};

void proceed(const struct sa *data);

static inline void func(struct sa sa) {
    proceed(&sa);
}

void test(struct sa sa) {
    func(sa);
}

It seems that in the optimal assembly output of test function, the address of the sa argument of it can be directly passed to proceed function, since the proceed function is guaranteed not to change data. However, the compiler (both x86-64 clang 14.0 and gcc 12.1, -O3 optimization level) emits assembly like:

test:                                   # @test
        sub     rsp, 24
        mov     rax, qword ptr [rsp   48]
        mov     qword ptr [rsp   16], rax
        movaps  xmm0, xmmword ptr [rsp   32]
        movaps  xmmword ptr [rsp], xmm0
        mov     rdi, rsp
        call    proceed
        add     rsp, 24
        ret

Note that in the output, the whole sa struct is copied from [rsp 32] to [rsp]. Why does the compiler not eliminate such copy?

CodePudding user response:

This is pretty clearly a missed optimization bug since it only happens with that extra level of inlining func().

You can report bugs on https://github.com/llvm/llvm-project/issues and https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc (GCC bug reports prefer AT&T syntax, so select that in your Godbolt link; it's generally good to include a Godbolt link in a GCC bug report along with the actual code and asm, so it's quick for future readers to check if it's been fixed, or play around with it.)

For GCC, use the keyword missed-optimization.


since the proceed function is guaranteed not to change data

No, it's legal to cast away const because the original pointed-to object is not const. But it still doesn't need to copy; your function owns its stack arg space, and can let other functions modify the only copy if it wants. (Most calling conventions work this way, including all System V conventions such as x86-64 SysV which is in use here. Also I think Windows x64, where args larger than 8 bytes are passed by non-constant(?) reference to space reserved by the caller, with a pointer in a register, or on the stack if there are 4 or more args before it.)

The caller of test can't assume it's unmodified, so another call with the same arg would need to re-copy the struct after this returned. Even foo(const struct sa); would work this way; there's no way to declare / promise that a function doesn't reuse its stack arg space for scratch space or args to tail-calls.


This test-case on Godbolt demonstrates that it's a missed optimization: test will tailcall with just jmp func if it's noinline, not copying any args there. And that non-inline definition of func won't copy either, just the expected RSP alignment then lea rdi, [rsp 16] / call proceed to pass a pointer to its stack arg.

So adding __attribute__((noinline)) to your func will result in your test calling proceed without copying the arg, with just an extra jmp in the path of execution. If that's legal, it would also be legal to do that when inlining func.

struct sa {
    char buffer[24];
};
void proceed(const struct sa *data);

__attribute__((noinline))
static void func(struct sa sa) {
    proceed(&sa);
}

void test_struct(struct sa sa) {
    func(sa);
}
// same as non-inline func()
// void test_struct_direct(struct sa sa) { proceed(&sa); }
# clang (trunk) -O3
# GCC is equivalent but uses sub/add instead of dummy push/pop
test_struct:
        jmp     func                            # TAILCALL
func:
        pushq   %rax                  # re-align the stack by 16
        leaq    16(%rsp), %rdi
        callq   proceed
        popq    %rax                  # clean up the stack
        retq

Feel free to shortlink that exact Godbolt link in your bug report, or with something commented or uncommented; it uses nightly builds of GCC and clang so devs will know it's not already fixed. Also feel free to link this Stack Overflow Q&A, but your bug report should be self-contained and point out that the optimization is legal, and that uncommenting __attribute__((noinline)) makes the difference.

(So probably for a bug report, you'd want noinline commented out, and uncomment the test_struct_direct version that manually inlines func.)

  • Related