I am looking for an intrinsic function that can take the 8 32-bit integers in an avx2 register and store them each at their own index in an array (essentially the store-equivalent to _mm256_i32gather_epi32). As far as I can tell, such a function doesn't exist, but I'm not sure if I'm just missing something as I am new to SIMD programming.
CodePudding user response:
You’re correct, that instruction doesn’t exist in AVX2. Here’s one possible workaround. But note that will compile into quite a few instructions. If you can, do something else instead.
// Store 4 integers from SSE vector using offsets from another vector
inline void scatter( int* rdi, __m128i idx, __m128i data )
{
rdi[ (uint32_t)_mm_cvtsi128_si32( idx ) ] = _mm_cvtsi128_si32( data );
rdi[ (uint32_t)_mm_extract_epi32( idx, 1 ) ] = _mm_extract_epi32( data, 1 );
rdi[ (uint32_t)_mm_extract_epi32( idx, 2 ) ] = _mm_extract_epi32( data, 2 );
rdi[ (uint32_t)_mm_extract_epi32( idx, 3 ) ] = _mm_extract_epi32( data, 3 );
}
// Store 8 integers from AVX vector using offsets from another vector
inline void scatter( int* rdi, __m256i idx, __m256i data )
{
scatter( rdi, _mm256_castsi256_si128( idx ), _mm256_castsi256_si128( data ) );
scatter( rdi, _mm256_extracti128_si256( idx, 1 ), _mm256_extracti128_si256( data, 1 ) );
}