How to let gcc optimize std::bit_cast with std::move?-CodePudding

Consider below code:

#include <cstdint>
#include <bit>
#include <utility>

struct A { uint32_t a[100]; };
struct B { uint16_t b[200]; };

void test(const A&);

void foo() {
    B tmp;
    test(std::bit_cast<A>(std::move(tmp)));
}

void bar() {
    B tmp;
    test(reinterpret_cast<A&>(tmp));
}

For clang 15 with -O3, foo and bar are equivalent, but for GCC 12.2 with -O3, foo needs to do data copy (rep movsq).

foo():
        sub     rsp, 808
        mov     ecx, 50
        lea     rdi, [rsp 400]
        mov     rsi, rsp
        rep movsq
        lea     rdi, [rsp 400]
        call    test(A const&)
        add     rsp, 808
        ret
bar():
        sub     rsp, 408
        mov     rdi, rsp
        call    test(A const&)
        add     rsp, 408
        ret

Which compiler option can make GCC optimize such thing like Clang? Thanks. P.S. -Ofast is not helpful for this question.

[Edit] Based on the answer provided by user17732522, I modified the code to be:

#include <cstdint>
#include <bit>

struct A { uint32_t a[100]; };
struct B { uint16_t b[200]; };

void test(const A&);

void foo(B arg) {
    test(std::bit_cast<A>(arg));
}

void bar(B arg) {
    test(reinterpret_cast<A&>(arg));
}

Now both GCC and Clang use data copy for foo. So, looks like std::bit_cast is not intended to cover this kind of cases.

CodePudding user response：

std::move into std::bit_cast is completely pointless and doesn't have any effect at all since std::bit_cast has a lvalue reference parameter and no rvalue reference overload.

In your test case tmp is never used in foo except to read (uninitialized!) data from it. It is therefore clearly a missed optimization by the compiler to not realize that this object is not needed at all and an uninitialized A could be used directly. This is not something you can solve on the language level. This is completely up to compiler optimization.

In fact it seems that GCC is intentionally not eliminating the copy instruction because you are reading uninitialized data. If you zero-initialize the array in B, then GCC produces the same output for both functions without an additional copy in the std::bit_cast version.

There is not really anything you can do about this, but I don't really see any value in the test case. You could just declare a A directly and have the same effect.