Description of x86-64 illegal two byte opcodes-CodePudding

Recently wrote a C program to Find Two-byte Illegal Opcodes for x86-64 and pasted the output at https://pastebin.com/5xjjFea6

For example, here are some illegal two byte opcodes

0x0f,0x04  0x0f,0x0a  0x0f,0x0b  0x0f,0x0c  0x0f,0x0e  0x0f,0x0f  0x0f,0x24  0x0f,0x25 0x0f,0x26  0x0f,0x27  0x0f,0x36  0x0f,0x37 0x0f,0xaa

From google'ing, I found only a few descriptions

0x0f,0x0b - ud2 - Generates an invalid opcode. 
 
0x0f,0x37 - getsec - Exit authenticated code execution mode.
 
0x0f,0xaa - rsm - Resume operation of interrupted program.

Questions

Is there documentation that explains most illegal opcodes?
Why do illegal opcodes exist?
Why aren't they treated as a NOP on x86-64 or assigned to something useful?

CodePudding user response：

#UD illegal-instruction faults exist to let you know you tried to run an instruction this CPU doesn't support.

For most new instructions, it's best if old CPUs #UD fault on them instead of silently doing the wrong thing (or nothing). e.g. getsec and rsm aren't instructions you can use in cases where they might run as NOPs instead of doing what you wanted. (Fun fact: 8086 didn't have a #UD fault; any byte sequence would execute as something.)

Only for speculative hints like prefetchw, or other cases where a fallback works, do you want old CPUs to run it as a NOP so code can use it without checking. You can often get that with REP in front of an existing instruction (like how pause is rep nop).

There are enough NOP-equivalent opcodes for future use; it's best to leave the rest of the unused coding-space actually faulting.

From a CPU vendor's point of view, having some byte-sequences #UD fault means they can be sure existing code isn't (accidentally or on purpose) relying on them to work as long NOPs instead of using the recommended encodings.

Is there documentation that explains most illegal opcodes?

No, apart from the fact that Intel says the result is unpredictable for executing anything the current manual doesn't explicitly define. I didn't check the wording of Intel's or AMD's manuals, but I'd assume they say that's one of the possibilities.

Remember, they're not documenting specific CPUs, they're documenting the x86-64 ISA in ways that encourage people to create forward-compatible programs that will still work correctly on future CPUs where those byte sequences might be part of a new instruction that currently hasn't even been proposed yet.

#UD fault is a common behaviour in practice for many of those byte sequences. But other sequences, like a rep prefix on an instruction where that doesn't mean anything, often just ignore the rep prefix. That lets the CPU vendors introduce a new extension where rep xyz means something else (like rep bsf becoming tzcnt, which produces the same results for non-zero inputs), and at that point they can document the behaviour for previous CPUs of running it as just plain bsf. (Or for pause running rep nop as nop).

There are multiple other examples, but the point is that the behaviour existed all along, and just gets retroactively documented for old CPUs when the rep xyz encoding gets documented to do something else when some CPUID feature bit is set.

What happens when a rep-prefix is attached to a non string instruction?
Are Intel TSX prefixes executed (safely) on AMD as NOP? (xacquire/xrelease are the same bytes as F2/F3 REP/REPNE prefixes)
What happens when you use a memory override prefix but all the operands are registers? (docs say "reserved", on current hardware it's ignored. The lock prefix will #UD when it doesn't apply, unlike others.)
What does "rep; nop;" mean in x86 assembly? Is it the same as the "pause" instruction?
Assembly x86 REP, REPZ, REPNZ, XACQUIRE and XRELEASE instructions
How did Pentium III CPUs handle multiple instruction prefixes from the same group?
What does `rep ret` mean? - a rare case of ignore-rep behaviour being widely relied on in a case where it isn't documented. So widely used (by GCC by default for years) that CPU vendors can't plausibly change it without stopping their CPUs from running GCC-compiled binaries, which would not be a commercially-viable option for a general-purpose CPU.
What methods can be used to efficiently extend instruction length on modern x86?

Why aren't they assigned to something useful?

There's basically no coding-space left in 32-bit mode, and no free one-byte opcodes. All extensions to date have used the same encoding for both 32 and 64-bit; that makes sense if you care about 32-bit mode, you want the decoders to only have to look for the same patterns. That's why VEX (AVX) and EVEX (AVX-512) prefixes use some inverted bits so they overlap with invalid encodings in 32-bit mode, also why they're still limited to 8 vector registers in 32-bit mode.

Unfortunately there haven't been any extensions introducing new 64-bit-only instructions to use some of the freed up 1-byte opcodes (BCD instructions like aaa, or push/pop of segment registers). There are multiple they could add, like a mov r/m32, sign_extended_imm8 to improve code density in 64-bit mode, making instructions like mov ecx, -2 or 1 into 3 bytes instead of 5, and 4 bytes instead of 7 for mov rcx, -2.

But with the Microsoft model of software distribution, one binary that runs everywhere, and detects CPU features to use in a few special functions, there's insufficient gain from a feature like that. It's the kind of thing that only helps a bit across a whole binary, so only cases like gcc -march=native. BMI1/2 is like that, mostly just doing in 1 instruction what could be done in 2, so mostly helps when used in all functions in a program. But it's conceivable that a few bithack functions can get enough speedup to be worth it.

Intel and AMD have had no interest in introducing extensions that won't be immediately useful, so extensions that would become most useful in a decade or so when they're ubiquitous have been mostly ignored (widespread enough that some programs can assume it as part of their "baseline"). Despite dipping their toe in those waters with BMI1/2, Intel continued to sell CPUs without those extensions for many years, including Skylake Pentium/Celeron, and I think Silvermont-family CPUs (netbooks and low-power servers.)

Some BMI1/2 instructions use VEX encodings, and disabling VEX decoding was I think how Intel disabled AVX in those pre-Ice Lake low-end CPUs, with BMI1/2 being a casualty. (With Silvermont-family until Gracemont (Alder Lake E-core) not supporting AVX at all, they also didn't need to be able to decode VEX prefixes at all.)

AMD's original design for AMD64 was quite conservative, changing machine-code meanings as little as possible apart from removing some opcodes to free them up for possible future extensions. Clearly they didn't want to get stuck needing more decoder transistors in case AMD64 didn't catch on commercially. And to boost adoption by compilers and assemblers, they kept most of the warts of 32-bit mode, including setcc r/m8 instead of setcc r32/m8 so x86 still sucks at efficiently materializing an integer 0/1 from a compare condition.