Consider below code:
#include <cstdint>
#include <bit>
#include <utility>
struct A { uint32_t a[100]; };
struct B { uint16_t b[200]; };
void test(const A&);
void foo() {
B tmp;
test(std::bit_cast<A>(std::move(tmp)));
}
void bar() {
B tmp;
test(reinterpret_cast<A&>(tmp));
}
For clang 15 with -O3, foo and bar are equivalent, but for GCC 12.2 with -O3, foo needs to do data copy (rep movsq).
foo():
sub rsp, 808
mov ecx, 50
lea rdi, [rsp 400]
mov rsi, rsp
rep movsq
lea rdi, [rsp 400]
call test(A const&)
add rsp, 808
ret
bar():
sub rsp, 408
mov rdi, rsp
call test(A const&)
add rsp, 408
ret
Which compiler option can make GCC optimize such thing like Clang? Thanks. P.S. -Ofast is not helpful for this question.
[Edit] Based on the answer provided by user17732522, I modified the code to be:
#include <cstdint>
#include <bit>
struct A { uint32_t a[100]; };
struct B { uint16_t b[200]; };
void test(const A&);
void foo(B arg) {
test(std::bit_cast<A>(arg));
}
void bar(B arg) {
test(reinterpret_cast<A&>(arg));
}
Now both GCC and Clang use data copy for foo. So, looks like std::bit_cast is not intended to cover this kind of cases.
CodePudding user response:
std::move
into std::bit_cast
is completely pointless and doesn't have any effect at all since std::bit_cast
has a lvalue reference parameter and no rvalue reference overload.
In your test case tmp
is never used in foo
except to read (uninitialized!) data from it. It is therefore clearly a missed optimization by the compiler to not realize that this object is not needed at all and an uninitialized A
could be used directly. This is not something you can solve on the language level. This is completely up to compiler optimization.
In fact it seems that GCC is intentionally not eliminating the copy instruction because you are reading uninitialized data. If you zero-initialize the array in B
, then GCC produces the same output for both functions without an additional copy in the std::bit_cast
version.
There is not really anything you can do about this, but I don't really see any value in the test case. You could just declare a A
directly and have the same effect.