Home > Mobile >  mwaitx instruction not blocking
mwaitx instruction not blocking

Time:02-24

According to AMD's official docs, the mwaitx instruction can be used with the monitorx instruction to monitor an address range and see if it is modified. My code seems to be returning immediately, seemingly doing nothing.

The code in question:

push rbx
push r9
mov r9, rcx ;save parameter for later
mov eax, 80000001h ;check cpuid flag for monitorx, which is required regardless of whether or not we look at it
cpuid
mov rax, r9 ;the address to monitor
xor rcx, rcx ;no extensions
xor rdx, rdx ;no hints
monitorx rax, rcx, rdx
xor rax, rax ;no hints
mov rcx, 10b ;timeout specified in rbx, no interrupts
xor rbx, rbx ;infinite timeout
mwaitx rax, rcx, rbx ;not waiting; passes immediately
pop r9
pop rbx
ret

The C code:

int main()
{
    void* data = VirtualAlloc(nullptr, size, MEM_COMMIT, PAGE_READONLY);
    //int x = 5;
    std::cout << data << '\n';
    monitorAddress(data);
    std::cout << data << '\n';

    VirtualFree(data, 0, MEM_RELEASE);
    return 0;
}

CodePudding user response:

The documentation in AMD's manual (vol 3, rev 3.33) does not say that ECX[0] = 0 will mask interrupts even if IF=1 in E/RFLAGS. It would be insane for user-space to be able to do that without having IO privilege level = 0 (which would allow you to run a cli instruction), and the wording doesn't really hint at it.

In user-space, there should be no way to get a CPU stuck in a way that would make it hard for the kernel to wake it up! If you want to go for longer before asking the OS to put this thread to sleep (e.g. with Linux futex to wake you back up on memory change), you could use it in a loop exactly like a spin-wait loop that uses pause or something. From the OS's perspective it'd be the same: this thread is occupying the CPU for the entire time.

It's likely that your code does actually arm the monitor and enter the optimized sleep state, but wakes on the next timer interrupt after at most a few milliseconds. Check with rdtsc to see how long it sleeps for, because human perception of screen output can't distinguish that from failing to sleep at all.

What the documentation actually does say about the supported extension flags in ECX:

Bit 0: When set, allows interrupts to wake MWAITX, even when eFLAGS.IF = 0. Support for this extension is indicated by a feature flag returned by the CPUID instruction.

So, as an extension, you can override the fact that interrupts are disabled in eFLAGS, to make sure you don't enter a sleep state that lasts until an NMI. Otherwise, with ECX[0] = 0, all previous stuff in the documentation applies, including:

Events that cause an exit from the monitor event pending state include:

  • A store from another processor matches the address range established by the MONITORX instruction.
  • The timer expires.
  • Any unmasked interrupt, including INTR, NMI, SMI, INIT.
  • RESET.
  • Any far control transfer that occurs between the MONITORX and the MWAITX.

If you actually did want to do put the CPU into a sleep that wouldn't be ended by pending interrupts, you'd use cli before monitorx / mwaitx. Or use traditional monitor / mwait if you're in kernel mode proper, rather than user-space after a Linux iopl() system call or other way of getting IOPL=0 with CPL=3 (current privilege level), so you can't run privileged instructions in general, only the specific ones allowed by the IO privilege level, like in/out / cli/sti.

Unfortunately:

There is no indication after exiting MWAITX of why the processor exited or if the timer expired. It is up to software to check whether the awaiting store has occurred, and if not, determining how much time has elapsed if it wants to re-establish the MONITORX with a new timer value.

BTW, if you don't want the timer to be a possible exit condition, you can just leave ECX[1] = 0

Bit 1: When set, EBX contains the maximum wait time expressed in Software P0 clocks, the same clocks counted by the TSC. Setting bit 1 but passing in a value of zero on EBX is equivalent to setting bit 1 to a zero. The timer will not be an exit condition.

And BTW, EAX=0 isn't "no hints"; EAX[7:4] is always the desired C-state level, encoded at C-state - 1. So EAX=0 hints that you want C1 state. (To hint that you want C0 state, a less deep sleep that's faster to wake from, you'd set EAX = 0xf0, because F 1 = 0.)


It's also pointless to do xor rax,rax instead of xor eax,eax; writing a 32-bit register implicitly zeroes the upper bits of the full 64-bit register, so there's no false dependency. And there's no need to tempt the assembler into wasting a REX prefix to actually encode it as written. The MWAITX implicit input registers are all 32-bit anyway, so xor ecx, ecx would also be appropriate.

Also, r9 is call-clobbered (aka volatile) in the Windows x64 calling convention; you can just use it without saving/restoring, along with r8..r11.

And no you don't have to run a cpuid every time you want to do monitorx / mwaitx! AMD's documentation says you need to check once per program / library init, but there's no way the CPU can actually enforce that. It's not going to track across context switches which user-space process has actually run a CPUID.

;; uint32_t waitx(void *p)
;; returns TSC ticks actually slept for
waitx:
    mov   rax, rcx    ;the address to monitor
    xor   ecx, ecx    ;no extensions
    xor   edx, edx    ;no hints
    monitorx rax, ecx, edx    ; or just monitorx if your assembler complains

    lfence
    rdtsc
    lfence            ; make sure we're done reading the clock before executing later instructions
    mov   r8d, eax    ; low half of start time.   We ignore the high half in EDX, assuming the time is less than 2^32, i.e. less than about 1 second.

    xor   eax, eax    ; hint = EAX[7:4] = 0 = C1 sleep state
                      ; ECX still 0: no TSC timeout in EBX, no overriding IF to 1
    mwaitx  eax, ecx

    rdtscp            ; EAX = low half of end time, EDX = high, ECX = core #
    sub     eax, r8d  ; retval = end - start
    ret

(LFENCE serializes execution on AMD CPUs if the OS has enabled the Spectre mitigation feature bit, giving lfence that guarantee like on Intel CPUs. Otherwise it's a NOP on AMD, IIRC.)

  • Related