I was looking some example with std::visit
, and I wanted to explore a bit the following common example code:
#include <iostream>
#include <variant>
struct Fluid { };
struct LightItem { };
struct HeavyItem { };
struct FragileItem { };
template<class... Ts> struct overload : Ts... { using Ts::operator()...; };
template<class... Ts> overload(Ts...) -> overload<Ts...>; // line not needed in C 20...
int main() {
std::variant<Fluid, LightItem, HeavyItem, FragileItem> package(HeavyItem{});
std::visit(overload{
[](Fluid& ) { std::cout << "fluid\n"; },
[](LightItem& ) { std::cout << "light item\n"; },
[](HeavyItem& ) { std::cout << "heavy item\n"; },
[](FragileItem& ) { std::cout << "fragile\n"; }
}, package);
}
I've compiled the code with both GCC and MSVC, and I've noticed that in the last case the amount of generated ASM code is order of magnitude greater than the GCC one.
Here the code compiled with GCC.
Here the code compiled with MSVC.
Is there a way to know why there's so much difference? Is there a way to optimize with MSVC in order to obtain an ASM similar to the GCC one?
CodePudding user response:
MSVC /Os
alone doesn't enable any(?) optimization, just changes the tuning if you were to enable optimization. Code-gen is still like a debug build. Apparently it needs to be combined with other options to be usable? It's not like GCC -Os
, for that use MSVC -O1
.
If you look at the asm source instead of the binary disassembly, it's easier to see that MSVC's main
calls a constructor, std::variant<...>::variant<...>
, zeros some memory, then calls std::visit
. But GCC has obviously inlined it down to just a cout<<
MSVC also inlines and constant-propagates through std::visit
if you tell it to fully optimize, with -O2
or -O1
instead of /Os
. https://godbolt.org/z/5MdcYh9xn has a main
that's about the same as GCC's, just calling cout
's operator<<
with the address of a constant.
MSVC's docs don't make it clear which options actually enable (some/any) optimization vs. just biasing the choices if some other option enables some optimization.
/O1
sets a combination of optimizations that generate minimum size code./O2
sets a combination of optimizations that optimizes code for maximum speed....
/Os
tells the compiler to favor optimizations for size over optimizations for speed./Ot
(a default setting) tells the compiler to favor optimizations for speed over optimizations for size.[But note that optimization in general is off by default, and this being the default doesn't change that. So
/Os
and/Ot
don't seem to enable optimization at all.]/Ox
is a combination option that selects several of the optimizations with an emphasis on speed. /Ox is a strict subset of the /O2 optimizations.
If I hadn't tested, I'd have assumed from that doc that -Os
would enable at least some optimizations. (MSVC accepts both -
and /
as the start of an option name; I wrote -
in most of this answer because that's what Unix/Linux use and I know that MSVC accepts it.)
(MSVC always prints a ton of stuff in its asm source output, including stand-alone definitions for template functions that got inlined. I assume that's why you were using compile-to-binary to see what actually ended up in the linked executable. For some reason with a /O1
build on Godbolt, it can run but won't show disassembly: Cannot open compiler generated file [...]\output.s.obj
. Or no, it's just intermittently broken for me, even with your original link.)
Simpler example
For example, this bar()
becomes very simple after inlining, but MSVC /Os
doesn't do that even for this trivial function. In fact, code-gen is identical with no options, the default debug mode.
int foo(int a,int b){ return a b*5;}
int bar(int x){
return foo(3*x, 2*x);
}
; MSVC 19.32 /Os
int foo(int,int) PROC ; foo
mov DWORD PTR [rsp 16], edx
mov DWORD PTR [rsp 8], ecx
imul eax, DWORD PTR b$[rsp], 5
mov ecx, DWORD PTR a$[rsp]
add ecx, eax
mov eax, ecx
ret 0
int foo(int,int) ENDP ; foo
x$ = 48
int bar(int) PROC ; bar
$LN3:
mov DWORD PTR [rsp 8], ecx
sub rsp, 40 ; 00000028H
mov eax, DWORD PTR x$[rsp]
shl eax, 1
imul ecx, DWORD PTR x$[rsp], 3
mov edx, eax
call int foo(int,int) ; foo
add rsp, 40 ; 00000028H
ret 0
int bar(int) ENDP ; bar
Not just lack of inlining; note the spill of x
and two reloads when computing x*2
and x*3
. Same for foo
, spilling its args and reloading, like a debug build. At first I thought it wasn't fully a debug build due to not using RBP as a frame pointer, but MSVC generates identical asm with no options.
vs. with a usable optimization level, MSVC -O1
, where code-gen is very similar to GCC -O2
or -Os
; MSVC 19.32 -O1
x$ = 8
int bar(int) PROC ; bar, COMDAT
imul eax, ecx, 13
ret 0
int bar(int) ENDP ; bar
a$ = 8
b$ = 16
int foo(int,int) PROC ; foo, COMDAT
lea eax, DWORD PTR [rcx rdx*4]
add eax, edx
ret 0
int foo(int,int) ENDP ; foo