Why does MSVC generate nop instructions for atomic loads on x64?-CodePudding

If you compile code such as

#include <atomic>

int load(std::atomic<int> *p) {
    return p->load(std::memory_order_acquire)   p->load(std::memory_order_acquire);
}

you see that MSVC generates NOP padding after each memory load:

int load(std::atomic<int> *) PROC
        mov     edx, DWORD PTR [rcx]
        npad    1
        mov     eax, DWORD PTR [rcx]
        npad    1
        add     eax, edx
        ret     0

Why is this? Is there any way to avoid it without relaxing the memory order (which would affect the correctness of the code)?

CodePudding user response：

p->load() may eventually use the _ReadWriteBarrier compiler intrinsic.

According to this: https://developercommunity.visualstudio.com/t/-readwritebarrier-intrinsic-emits-unnecessary-code/1538997

the nops get inserted because of the flag /volatileMetadata which is now on by default. You can return to the old behavior by adding /volatileMetadata-, but doing so will result in worse performance if your code is ever run emulated. It’ll still be emulated correctly, but the emulator will have to pessimistically assume every load/store needs a barrier.

And compiling with /volatileMetadata- does indeed remove the npad.