Home > Software engineering >  Clarifications about SIMD in C
Clarifications about SIMD in C

Time:12-20

This is what I know about SIMD. Single-instruction-multiple-data is a way of processing data that performs the same instruction over vectors of multiple values. SIMD is implemented at different levels depending on the processor of the machine (SSE, SSE2, NEON...), and every level provides a different instruction set.

We can use these instructions sets by including immintrin.h. What I haven't really understood is: when actually developing something with SIMD, should we care about checking which instruction sets are supported? What are the best practices when developing such programs? What should we do if, for example, an instruction set is not supported; should we provide a non-SIMD alternative or the compiler unvectorises the whole thing for us?

CodePudding user response:

Of course we need to take care which ISA is supported, because if we use an unknown instruction then the program will be killed with a non-supported instruction signal. Besides it allows us to optimize for each architecture, for example on CPUs with AVX-512 we can use AVX-512 for better performance, but if on an older CPU then we can fallback to the appropriate version for that architecture

What are the best practices when developing such programs?

There are no general best practices. It depends on each situation because each compiler has different tools for this

  • If your compiler doesn't support dynamic dispatching then you need to write separate code for each ISA and call the corresponding version for the current platform
  • Some compilers automatically dispatch to the version optimized for the running platform, for example ICC can compile a hot loop to separate versions of SSE/AVX/AVX-512 and jump to the correct version for maximum performance.
  • Some other compilers support compiling to separate versions of a single function and automatically dispatch but you need to specify which function you want to optimize. For example in GCC, Clang and ICC you can use the attributes target and target_clones. See Building backward compatible binaries with newer CPU instructions support

CodePudding user response:

should we care about checking which instruction sets are supported?

Usually yes, but not always. If you compile 64-bit code for PCs, you’re guaranteed to have SSE1 and SSE2, these two are part of the AMD64 instruction set, guaranteed to be supported.

What are the best practices when developing such programs?

Negotiate with people about minimum hardware requirements for the software you’re working on. If you don’t have boss, client, nor users, find some stats and try to make educated guesses. Steam has a nice stats for PC gamers who have their software installed, expand “other settings” and you’ll see percentage of global users with specific instruction set.

Personally, I think now in 2021 it’s generally OK to require SSE up to and including SSE 4.1, and fail at startup if not supported. Assuming you do that gracefully, i.e. write that in hardware requirements, and in runtime show a comprehensible error message to end users about unsupported CPU.

should we provide a non-SIMD alternative

99% of new computers sold in the last decade have at least 4GB RAM, and a 64-bit OS. I think for most projects it’s OK to only ship 64-bit binaries, this gives you SSE 1 and 2, no need for scalar alternatives.

Sometimes, when I need to support SSE-only CPUs but AVX brings too much profit in terms of performance, I indeed implementing couple alternatives, and a runtime dispatch.

  • Related