Home > Net >  _mm_loadu_si32 not recognized by GCC on Ubuntu
_mm_loadu_si32 not recognized by GCC on Ubuntu

Time:07-03

When I try to use _mm_loadu_si32, VScode gives me the error message:
a value of type "int" cannot be used to initialize an entity of type "__m128i
When trying to compile, I get the error message:
implicit declaration of function '_mm_loadu_si32'

The weird part is that a couple lines before _mm_loadu_si32, I'm using _mm_loadu_si128 without having any kind of problems. _mm_loadu_si64 also works.
Also, on windows, my program compiles.

I ran sudo apt-get update and sudo apt-get upgrade, so the problem isn't outdated software. Is this some kind of gcc bug restricted to Ubuntu?

OS: Ubuntu 20.04
gcc: 9.4.0

CodePudding user response:

Your GCC is too old, you need GCC11 for it to be defined by immintrin.h

And you need GCC11.3 or GCC12 for a non-broken version that puts the loaded bytes in the correct place in the resulting vector, and to be alignment / strict-aliasing safe. GCC bug 99754

GCC and/or clang sometimes miss defining some "helper" intrinsics, only eventually getting around to them. This is one of those cases, and even worse, the first attempt at adding it was buggy. There are GCC versions out there (GCC11.0 through 11.2) which support it but mis-compile it (shuffling the dword or word into the top element after loading, instead of the bottom, because they used _mm_set instead of _mm_setr in the header implementation.)


The FP equivalent 4-byte load, __m128 _mm_load_ss(float*), has been defined forever, but is still not alignment or strict-aliasing safe in GCC's implementation like it is in other compilers. GCC's header derefs the float*, instead of using memcpy or an __attribute__((aligned(1),may_alias)) pointer type. That's GCC bug PR84508.

So unfortunately, in GCC, it's not safe to use _mm_castps_si128( _mm_load_ss( (float*)ptr )) either.


Portable implementation for older compilers

Your best bet for an aliasing-safe unaligned 4-byte load is probably this portable implementation:

__m128i movd_load(void *p)
{
    int tmp;                       // int32_t on implementations that support intrinsics
    memcpy(&tmp, p, sizeof(tmp));  // unaligned aliasing-safe load
    return _mm_cvtsi32_si128(tmp);
}

This compiles nicely on GCC/clang/MSVC (Godbolt showing all). Both old and new versions of GCC and clang: Tested GCC4.7 and GCC12, just the expected movd xmm0, [rdi] / ret.

But it compiles stupidly on ICC, loading into EAX and then either store/reload or movd xmm0, eax, instead of a memory source operand for movd.


This is also useful as a building-block for pmovzx / pmovsx loads (one of the significant use-cases for narrow loads into __m128i, especially unaligned and aliasing-safe loads), such as

#if defined(__SSE4_1__) || defined (_MSC_VER)
__m128i pmovzxbd_load(void *p)
{
    __m128i v = movd_load(p);
    return _mm_cvtepu8_epi32(v);  // folds the load with GCC9 or later
    // but not ICC or MSVC, or earlier GCC: they all movd into an XMM reg and pmovzxbd xmm0,xmm0
    // clang gets this right, with a mem src pmovzxbd
}
#endif
# GCC8.5 -O2 -march=skylake -mno-avx
# and MSVC19.14.  ICC 2021 is even worse, going through EAX
pmovzxbd_load:
        movd    xmm0, DWORD PTR [rdi]
        pmovzxbd        xmm0, xmm0
        ret
# GCC9.5 -O2 -march=skylake -mno-avx
# and clang
pmovzxbd_load:
        pmovzxbd        xmm0, DWORD PTR [rdi]
        ret
  • Related