If you compile code such as
#include <atomic>
int load(std::atomic<int> *p) {
return p->load(std::memory_order_acquire) p->load(std::memory_order_acquire);
}
you see that MSVC generates NOP padding after each memory load:
int load(std::atomic<int> *) PROC
mov edx, DWORD PTR [rcx]
npad 1
mov eax, DWORD PTR [rcx]
npad 1
add eax, edx
ret 0
Why is this? Is there any way to avoid it without relaxing the memory order (which would affect the correctness of the code)?
CodePudding user response:
p->load()
may eventually use the _ReadWriteBarrier
compiler intrinsic.
According to this: https://developercommunity.visualstudio.com/t/-readwritebarrier-intrinsic-emits-unnecessary-code/1538997
the nops get inserted because of the flag /volatileMetadata which is now on by default. You can return to the old behavior by adding /volatileMetadata-, but doing so will result in worse performance if your code is ever run emulated. It’ll still be emulated correctly, but the emulator will have to pessimistically assume every load/store needs a barrier.
And compiling with /volatileMetadata-
does indeed remove the npad
.