Home > Blockchain >  Intel JCC Erratum - what is the effect of prefixes used for mitigation?
Intel JCC Erratum - what is the effect of prefixes used for mitigation?

Time:12-14

Intel recommends using instruction prefixes to mitigate the performance consequences of JCC Erratum.

MSVC if compiled with /QIntel-jcc-erratum follows the recommendation, and inserts prefixed instructions, like this:

3E 3E 3E 3E 3E 3E 3E 3E 3E 48 8B C8   mov rcx,rax ; with redundant 3E prefixes

They say MSVC resorts to NOPs when prefixes are not possible.

Clang has -mbranches-within-32B-boundaries option for this, and it prefers nop, multi-byte if needed (https://godbolt.org/z/399nc5Msq notice xchg ax, ax)

What are the consequences of 3E prefixes, specifically:

  • Why does Intel recommend this, and not multi-byte NOPs?
  • What are the consequences for unaffected CPUs?
  • Reportedly, a program runs faster with /QIntel-jcc-erratum on AMD, what could be possible explanations?

CodePudding user response:

I now have some data point. The result of benchmarking for /QIntel-jcc-erratum on AMD FX 8300 is bad.

The slowdown is by a decimal order of magnitude for a specific benchmark, where the benefit on Intel Skylake for the same benchmark is about 20 percent. This aligns with Peter's comments:

I checked Agner Fog's microarch guide, and AMD Zen has no problem with any number of prefixes on a single instruction, like mainstream Intel since Core2. AMD Bulldozer-family has a "very large" penalty for decoding instructions with more than 3 prefixes, like 14-15 cycles for 4-7 prefixes

It's somewhat valid to consider Bulldozer-family obsolete enough to not care much about it, although there are still some APU desktops and laptops around for sure, but they'd certainly show large regressions in loops where the compiler put 4 or more prefixes on one instruction inside a hot inner loop (including existing prefixes like REX or 66h). Much worse than the 3% for MITE legacy decode on SKL.

Though indeed Bulldozer-family is obsolete-ish, I don't think I can afford this much of an impact. I'm also afraid of other CPUs that may choke with extra prefixes the same way. So the conclusion for me is not to use /QIntel-jcc-erratum for generally-targeted software. Unless it is enabled in specific translation units and dynamic dispatch to there is made, which is too much of the trouble most of the time.

  • Related